by Charles Roe
Humans like to classify things; we have spent our intellectual history taking apart the universe and creating (or some would say discovering) the underlying structure of all things from chains of galaxies to quarks. We love structure; it allows us to put knowledge into more easily understood, more easily disseminated boxes that allow for clearer systems of formal classification.
The Ancient Greeks are often acknowledged as establishing the basis of western formal thought structures and systems of critical analysis known collectively as philosophy. The term is generally credited to the great Ionian mathematician, scientist, and religious mystic Pythagoras who lived circa 570 BCE. Parmenides, circa 500 BCE, is given credit for the first discussions on the ontological categorization of existence (though the dates are not entirely agreed upon). Etymologically the term ontology, like most philosophical terminology, comes from Greek and means essentially “the study or theory of being or that which is.” Yet, historically the first known written use of the word comes from the Latin ontologia in the early 17th century.
The somewhat vague terms “often acknowledged,” “generally credited to,” “given credit for,” “first known written use,” were purposely employed in the previous paragraph to make a point: the ancient history of philosophy, just like the etymology of words and current uses of such terminology is and always will be debated by those who care about such debates. The term ontology is an apt example of such a word. We know the accepted history of its use and its etymology, we know how it has been commonly used throughout history and we can study its changes with the advent of AI and computer science in the mid-1970s. Yet, how many people today actual agree on what an “ontology” is, relative to its modern classificatory sense? How does it differ from a vocabulary? A taxonomy? What are some of the ontologies used today in Data Management?
The Prevailing Trend
The modern history of ontology really beings with Artificial Intelligence (AI) research from the 1970s and 1980s. According to Tom Gruber, a pioneer in AI exploration and semantic web technologies, AI researchers borrowed the term ontology from philosophy as an apt system for the ordering of knowledge systems that they required:
“In philosophy, one can talk about an ontology as a theory of the nature of existence (e.g. Aristotle’s ontology offers primitive categories, such as substance and quality, which were presumed to account for All That Is). In computer and information science, ontology is a technical term denoting an artifact that is designed for a purpose, which is to enable the modeling of knowledge about some domain, real or imagined.”
Mr. Gruber wrote two famous papers in the 1990s that cemented the use of the word ontology within the contemporary sphere of computer science:
- “Toward Principles for the Design of Ontologies Used for Knowledge Sharing” (1993)
- “A Translation Approach to Portable Ontology Specifications” (1995)
The first one set the stage for the use of the term ontologies “as a way of specifying content-specific agreements for the sharing and reuse of knowledge among software entities.” The second defined an ontology as “an explicit specification of a conceptualization,” while a conceptualization in Mr. Gruber’s terms is “an abstract, simplified view of the world that we wish to represent for some purpose. Every knowledge base, knowledge-based system, or knowledge-level agent is committed to some conceptualization, explicitly or implicitly.”
Thus, an ontology as defined and used within modern computer science (and now many other fields) is, in simple terms, a system for the formal organization of information. This relates to the ancient philosophy on the nature of existence in that both systems classify “being/that which exists” whether they are subjects/objects in a domain, conceptual models for automated reasoning, or categories of individual identity.
Some other helpful definitions include:
- Lars Marius Garshol, Onotopia: “[t]he core meaning within computer science is a model for describing the world that consists of a set of types, properties, and relationship types. Exactly what is provided around this varies, but this is the essentials of an ontology. There is also generally an expectation that there be a close resemblance between the real world and the features of the model in an ontology.”
- Nicole Washington & Suzanna Lewis, Nature Education: “An ontology is a logic-based organizational structure for knowledge. Ontologies speed genetic discovery by allowing researchers to quickly find and compare data from multiple sources.”
- Roberto Navigli and Paola Velardi: “The goal of a domain ontology is to reduce (or eliminate) the conceptual and terminological confusion among the members of a virtual community of users (for example, tourist operators, commercial enterprises, medical practitioners) who need to share electronic documents and information of various kinds.”
Taxonomy versus Ontology?
The scope of this article cannot cover all of the extensive research done in terms of delineating the differences between modern taxonomies and ontologies; books have been written on the subject and there is much disagreement and debate within the Data Management industry itself of their particular differences and uses, let alone other industries. But, some demarcation is helpful.
In his article “Ontology and Taxonomy,” Steve Hoberman used his own expertise along with many quotes from specialists in the field to gain more clarity on the differences between the two terms. Some of the highlights from that article will aid in a better understanding:
- Gordon Everest: “The synonym for ontology would be model (of something in data), and the synonym for taxonomy would be tree.”
- Robert Ruffin: “The taxonomy of a tiger is that it is a subtype of cat (classification), but an ontological description may be that the tiger has a relationship to Asia, the continent on which it lives.”
- “A taxonomy is an ontology in the form of a hierarchy,” and “Whereas ontologies can have any type of relationship between categories, in a taxonomy there can only be hierarchies.”
Christine Connors, the Principal at TriviumRLG LLC, offers a further differentiation:
“Efforts are underway to transform semantic systems into more than just known item or NLP derived labeling to systems capable of contextual understanding. Ontologies are the means by which much of this effort will be accomplished in the short term. An ontology is more advanced than a taxonomy as it can contain self-defined relationships beyond that of parent-child. It can also be used to infer data and reason over information.”
Thus, both taxonomies and ontologies are in essence vocabularies that offer a structured means of classification. Whereas taxonomies exist within a strictly hierarchical scheme and work well for the classification of such elements as Reference Data, Master Data Management, and distributed computing systems, ontologies expand the relationship possibilities to levels that taxonomies do not.
An appropriate (but simplified) example would be a Knowledge Base (KB). In a simple structural-taxonomic system, each file in the KB is assigned to one (though possibly more) nodes; each child has a single relation to one parent in a strict hierarchy, similar to a directory structure of PC hard drive. In an ontological KB structure, there are multiple parents tied to multiple children in a “poly-hierarchy” with highly structured and formal constraints. A taxonomy is really a tree or directory structure, while an ontology could be the forest (or entire KB); yet, the forest is far more formal about the semantic structuring of classes, attributes, relations, objects, rules and restrictions than the tree is due to the increased complexity of the varying relationships.
Some Examples of Ontologies
There are literally thousands of existing ontologies in the world today in virtually every industry from software engineering to medical research, e-commerce to banking, linguistic processing to document publishing and so forth. Even in the Data Management industry alone there are too many to easily discuss. Thus, to distill the topic down to give some well-defined examples, only a few mentioned in a recent DATAVERSITY™ webinar will be noted:
- Dublin Core® MetaData Initiative (DCMI): First conceived in 1994 during the 2nd International World Wide Web Conference, DCMI was created to provide “core metadata vocabularies in support of interoperable solutions for discovering and managing resources.” Grounded in the The Dublin Core Metadata Element Set, DCMI works to promote open consensus building in the development and maintenance of metadata vocabularies, worldwide participation in the project, encouragement of neutrality in the adoption and use of the standards and a comprehensive cross-disciplinary focus to break down “information silos” so that all data is shared data.
- Good Relations Ontology: Started in 2008, Good Relations is a simple but powerful e-commerce ontology for “vocabulary for publishing all of the details of your products and services in a way friendly to search engines, mobile applications, and browser extensions.” It seeks to streamline the e-commerce process and is the only OWL DL ontology that both Yahoo! and Google support. It comes with a Creative Commons Attribution 3.0 license, so it is Open Source.
- Web Ontology Language (OWL): The OWL was created to facilitate “greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics,” especially for the Semantic Web. A W3C (World Wide Web Consortium) standard, OWL is hoped to aid in the structuring of the Semantic Web through the adoption of common systems of processing Web content.
For more examples, or to do further searches on ontologies see:
- Sindice.com: A Semantic Web Index
- Umbel.org: Umbel is a Vocabulary and Reference Concept Ontology
- Bioontology.org: A good search site for many different biomedical ontologies
- Cyc.com: Founded in 1994, Cycorp works to standardize, develop, commercialize and do more research into AI.
- Open Directory Project: Lists a number of published ontologies
- Wikipedia’s listing on Ontology (information science) also has a long list of many known published ontologies at the bottom of the page.
Conclusion – Unstructured Data and Data Ontologies
The initial statements from Tom Gruber way back in 1993 during the bygone days of Web development still ring true today:
“Knowledge-based systems and services are expensive to build, test, and maintain. A software engineering methodology based on formal specifications of shared resources, reusable components, and standard services is needed. We believe that specifications of shared vocabulary can play an important role in such a methodology.”
The rapid growth of Unstructured Data over the past few years has spawned an explosion of Big Data products meant to aid in the “structuring” of such massive amounts of data created from blogs, social networking, video and a host of other “unstructured” elements. Enterprises worldwide are hurrying to collect, analyze, and translate into appreciable information petabytes and even exabytes of data so they can achieve a competitive edge in the marketplace. Yet, as the data volumes expand to ever greater quantities, as non-relational data systems continue to enter the market, as the complexity of systems, platforms, products, and codes continues to increase, new “common” solutions are needed. Data pandemonium is rampant in the world today and the standardization of data systems through the adoption of common ontologies is still a hope of many. The dream of Tim Berners-Lee and the Semantic Web or Web 3.0 (call it what you will) is happening with the adoption and further work with OWL, RDF, XML and others. But, the train is starting to go out of control. Can data professionals contain the Big Data beast through Tom Gruber’s “shared vocabulary”? Or do we already have such a system in place? Or rather than a single unifying system are the many systems and the Lernaean Hydra of Ancient Greek Mythology with its many heads has already been slain? Such questions are best left to further discussion. Parmenides certainly had no idea what his philosophical musings would lead to some 2500 years after his death.