By Jim Wessely
Trying to get agreement on the definition of what taxonomy means or what a taxonomy is (in regard to unstructured data, not biology where it is pretty clear) can be very frustrating if not downright futile. I’ve heard lots of different definitions tossed about by many well-educated people, and even amongst those of us who practice the art. Once I almost got sucked into the black hole of trying to come up with an agreed upon definition of what a taxonomy is in the sense of content or knowledge management. Luckily I escaped unscathed. And even though I have it pretty straight in my own mind and have probably created somewhere around a couple hundred “taxonomies,” I still can’t really express what I think might be a clear definition. I find that to be pretty weird, actually.
Let’s take this discussion back a step or two. There is a lot of talk these days about “unstructured” data and information in general, and it’s about time as far as I’m concerned. It is still a confusing topic for a lot of people, however, even though unstructured information has been around for a few thousand years longer than structured information. I certainly don’t blame people for getting confused about this area though. A lot of the things that are written these days seem to take very different approaches to the topic.
A lot of jargon is being used in discussions of unstructured content, and that can be really confusing at times. Terms like taxonomies and ontologies, text analytics and text mining, information discovery, content management, digital asset management, records management, document management, and so on, and so on. And that’s not to mention stuff like auto-categorization, entity extraction, ETL, RDF, NoSQL, and lots more acronyms that mean nothing to most people. Where will it end?
Well, that’s an interesting question. It seems that everywhere you look somebody has an idea of where all of this is headed. Yes, I have some ideas of future scenarios, too, but I’ll keep those to myself for now.
What seems clear to me, though, is that we are (finally) entering the point where companies are starting to exploit their unstructured assets to real business advantage. Perhaps as a result, and perhaps just because it is about time, investments in technologies for working with unstructured data types are emerging. There are even a few application areas that are being addressed such as “voice of the customer” and sentiment analysis, or auto-categorization for knowledge and content management.
Perhaps we are seeing the dawn of a period that will place emphasis upon the value that can be gained by working with unstructured data and content. I hope so.