Cleaning up the Other Bucket

The Other category: also known as General, Miscellaneous, My Stuff, or, too often, the shared drive.

The Other category is the junk drawer for all those taxonomy terms that just don’t seem to fit anywhere else. Reaching into it is taking a chance as you never know what you may find—the yo-yo you can’t seem to throw away, a pair of rusty scissors threatening to impale you when looking for loose change, batteries, baggy ties, the old cell phone with numbers in it of people you don’t even remember. When it’s time for cleaning up, you can either throw it all away or sort it out and put everything in a more appropriate place.

On a recent project, we were faced with constructing a faceted taxonomy based in large part on the existing corporate taxonomy. The major challenge was deciding what of the existing taxonomy to salvage and what to discard. This was complicated by a previous technology implementation which allowed for the ability for employees to add unregulated terms—terms which, by default, were added to the taxonomy under a category called “Other.” Since terms were not vetted, the Other category included duplicate, concatenated, and just plain crazy terms.

Their “junk drawer” included 4148 terms out of the total 6529 terms in the taxonomy. That meant that 1/3 of the taxonomy was categorized, making it easier to determine where these values might fit in to the new taxonomy. The other 2/3, however, gave us no context and was an undifferentiated list of terms.

Our choice was to scrap it all and start from scratch or to sort all the junk in the drawer. We decided to work with the categorized third of the taxonomy as a basis for the framework and to establish criteria for sorting the remaining two-thirds.

After determining the necessary facets—a total of 17—we focused on what was already categorized, fitting the terms into appropriate locations and discarding any which were no longer necessary. This ran in parallel with our first criterion for Other terms: was the term a duplicate or synonym for an already categorized term? Rather than going through term by term, we exported the taxonomy into an Excel file, sorted the columns to group all Other terms together, and searched for duplicates and synonyms. Any redundant terms we discovered had their IDs mapped to existing categorized terms and then were knocked off the list.

The second criterion was based on how many times a particular term was tagged to content. We sorted the Other terms by the count column and eliminated (with a brief perusal) all the terms which had been added to the taxonomy at some point but were no longer tagged to content. We then moved to the high count items, categorizing them in a best location, mapping them to existing terms if applicable, or discarding terms which had a high count but were no longer deemed useful.

The final criterion was checking Other terms against a year’s worth of search log information. If a term wasn’t captured as a duplicate or had a low count, we checked against what users were actually searching. If a term existed but wasn’t being tagged and had a strong presence in the search logs, we retained it. The search logs also helped us uncover new terms into which we could roll existing Other terminology.

Though an inelegant process, we managed to pull out the best existing terms, feeling confident we had captured what was valuable from the old taxonomy while creating a new, faceted taxonomy of only 1127 terms meeting the needs of the company.

If we had to do it all over again, was cleaning out the junk drawer the best approach? Although a time-consuming, manual process to salvage what we could from the old taxonomy rather than create a taxonomy from scratch, I think our approach worked well because our client had a long and rich history which we didn’t want to lose by creating replacement terminology. Additionally, we captured tagging history by mapping existing values to values in the new taxonomy. Though a lot of work, the resulting taxonomy was much more user-friendly—and didn’t include an Other category.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: