You are here:  Home  >  Data Blogs | Information From Enterprise Leaders  >  Current Article

Six Blind Men and the Taxonomy

By   /  September 18, 2011  /  No Comments

By Jim Wessely

Learning a lil more of FocusTrying to get agreement on the definition of what taxonomy means or what a taxonomy is (in regard to unstructured data, not biology where it is pretty clear) can be very frustrating if not downright futile.  I’ve heard lots of different definitions tossed about by many well-educated people, and even amongst those of us who practice the art.  Once I almost got sucked into the black hole of trying to come up with an agreed upon definition of what a taxonomy is in the sense of content or knowledge management.  Luckily I escaped unscathed.  And even though I have it pretty straight in my own mind and have probably created somewhere around a couple hundred “taxonomies,” I still can’t really express what I think might be a clear definition.  I find that to be pretty weird, actually.

Let’s take this discussion back a step or two.  There is a lot of talk these days about “unstructured” data and information in general, and it’s about time as far as I’m concerned.  It is still a confusing topic for a lot of people, however, even though unstructured information has been around for a few thousand years longer than structured information.  I certainly don’t blame people for getting confused about this area though.  A lot of the things that are written these days seem to take very different approaches to the topic.

A lot of jargon is being used in discussions of unstructured content, and that can be really confusing at times.  Terms like taxonomies and ontologies, text analytics and text mining, information discovery, content management, digital asset management, records management, document management, and so on, and so on.  And that’s not to mention stuff like auto-categorization, entity extraction, ETL, RDF, NoSQL, and lots more acronyms that mean nothing to most people.  Where will it end?

Well, that’s an interesting question.  It seems that everywhere you look somebody has an idea of where all of this is headed.  Yes, I have some ideas of future scenarios, too, but I’ll keep those to myself for now.

What seems clear to me, though, is that we are (finally) entering the point where companies are starting to exploit their unstructured assets to real business advantage.  Perhaps as a result, and perhaps just because it is about time, investments in technologies for working with unstructured data types are emerging.  There are even a few application areas that are being addressed such as “voice of the customer” and sentiment analysis, or auto-categorization for knowledge and content management.

Perhaps we are seeing the dawn of a period that will place emphasis upon the value that can be gained by working with unstructured data and content.  I hope so.


Creative Commons License photo credit: zeyadbharucha

About the author

Jim Wessely is president and co-founder of Advanced Document Sciences, a consulting firm with a primary focus upon enterprise information organization and access. He has worked with unstructured information technologies since 1985, and spent many years researching and designing application solutions using text analysis, text mining, and unstructured information technologies. This background led Mr. Wessely to taxonomy design and implementation through content analysis. Mr. Wessely previously worked for IBM Global Services, where he helped clients to develop strategies and solutions for enterprise portals, content management, taxonomies, text analysis, and text mining. Prior to his work with IBM, Mr. Wessely was the principal architect for numerous advanced computational solutions in DuPont’s Central Research & Development, where his primary interest was unstructured information technologies and global scale information portals. Mr. Wessely has been a frequent presenter at conferences in both the United States and Europe on diverse topics such as enterprise content strategies, information management, taxonomy, auto-categorization, advanced information access, and personalized information delivery.

You might also like...

Thinking Inside the Box: How to Audit an AI

Read More →