A Brief History of Data Ontology

It can be said that the history of data ontology starts with the development of ontology as a concept in Greece, back in the fourth century B.C.E. It was developed by Aristotle, the famous philosopher. Ontology is a branch of philosophy that is used to classify and explain “that which exists” or answer the question “What is real?” It relies on language as a tool for both thinking and communication, and involves questioning what things do exist, how they are related to one another (providing context), and how to classify these things according to their similarities and differences.

Examples of ontology questions can include:

What is a bicycle?
Do souls exist?
Are emotions real?
What is nothing?
If numbers don’t have mass, are they an intellectual illusion?

Ontology asks questions similar to the questions sometimes asked by children – in a process that often short-circuits adult brains. (Aristotle created a framework to answer questions like this logically.)

“Data ontology” applies the philosophical concepts of ontology to modern data processing systems. In simple terms, data ontology is a formal system used for organizing and processing data. The ancient philosophy of ontology, which deals with the nature of existence, is being combined with computer science in an attempt to describe everything that is useful to a specific project or business transaction. (Efforts to represent all things in the universe, and their relationships, would be an infinite task, so limits and restrictions are a necessity in data ontology.)

Ideas, entities, events, and their relationships are used in data analytics to predict future events. The more accurate and inclusive the representations of reality, the more accurate the predictions.

The 1960s and Data Ontology

During the 1960s, computer systems were beginning to store and manage large amounts of data, which was a problem, because they were not yet designed to handle the volumes of data they were being presented with. Retrieving specific data from the large amounts stored within these vintage computers required humans with an advanced, almost intuitive understanding of their unique computer system.

Mainframe computers were relatively new in their evolution – reels of magnetized tape were used for data storage during the 1960s. They also cost hundreds of dollars for each minute of operation, primarily because database management at the time was so complicated. These vintage databases operated by using convoluted systems and rigid hierarchical structures to locate specific data on magnetized tapes. As a consequence, human computer specialists often had to write an entire program simply to access a specific bit o’ information.

The concept of data ontology came from the need for a more efficient, more functional way to access data stored within a computer system.

1970s and Data Ontology

Relational databases emerged as a highly functional solution to the ever-increasing amounts of data computer systems needed to handle. They provide an easy and efficient way for businesses and individuals to record and process financial records, personnel data, and marketing information. Relational databases are necessary for seamlessly accessing bank accounts, making online purchases, and modern research.

Data ontology laid the foundation for relational databases.

In 1970, The Relational Model of Data for Large Shared Data Banks, a paper by Dr. Edgar F. “Ted” Codd, introduced theory of database management, which made the use of computers both efficient and inexpensive. His relational model (when combined with SQL – structured query language) made it much, much easier to locate data. The paper describes a system to store and access the data in large databases without the use of a highly restrictive, inflexible internal data structure. (Most businesses currently use databases based on this paradigm, and the SQL associated with it.)

Ted Codd envisioned software that would allow its users access data/information without being a computer wizard. People with no technical understanding of how the computer system worked could access the needed data by simply typing in a few key words.

Codd introduced the idea that a database could organize data into linkable – or relatable – tables with common characteristics. This method of organizing data made it possible for humans to access an entire table of related data from a data system containing multiple tables, with only a single query. This process also had the additional, unintended effect of providing businesses with a better understanding of the relationships existing within their data. The new system provided business intelligence and supported better decision-making.

Don Chamberlin, a coauthor of the original SQL (structured query language), said,

“Ted’s basic idea was that relationships between data items should be based on the item’s values, and not on separately specified linking or nesting. This greatly simplified the specification of queries and allowed unprecedented flexibility to exploit existing data sets in new ways. He believed that computer users should be able to work at a more natural language level and not be concerned about the details of where or how the data was stored.”

Donald Chamberlin and Raymond Boyce developed structured query language during the mid-1970s. This quickly became the world’s most popular database language and was the first commercially standardized, successful computer language available for relational databases.

The concept of data ontology evolved to support linking data together by defining and clarifying relationships and definitions.

The 1990s and Data Ontology

The rise of the world wide web, the internet, and search engines took place in the 1990s. This, in turn, led to significant increases in the amount of data being stored and processed around the world, with search engines being used to find the desired data. With the use of search engines, data ontology has become increasingly important as a way of organizing data and providing it with meaning and context.

During the 1990s, AI researchers began using the term “ontology,” describing it as a useful system for the arrangement of the knowledge systems they needed for training artificial intelligence. Tom Gruber, a trailblazer in machine learning, artificial intelligence, and semantic web technologies, wrote,

“In philosophy, one can talk about an ontology as a theory of the nature of existence (e.g., Aristotle’s ontology offers primitive categories, such as substance and quality, which were presumed to account for All That Is). In computer and information science, ontology is a technical term denoting an artifact that is designed for a purpose, which is to enable the modeling of knowledge about some domain, real or imagined.”

Tom Gruber also wrote two papers in 1993 that expanded the use data ontology, Toward Principles for the Design of Ontologies Used for Knowledge Sharing and A Translation Approach to Portable Ontology Specifications.

In 1994, the Dublin Core MetaData Initiative (DCMI) was created to offer “core metadata vocabularies in support of interoperable solutions for discovering and managing resources.” This organization promotes open consensus building in developing and maintaining metadata vocabularies, and encourages worldwide participation in the adoption and use of standardized metadata. According to the DCMI, in terms of Semantic Web usage, the word “vocabulary” has essentially the same meaning as ontology.

The DCMI developed the Dublin Core, and uses 15 metadata terms in support of a very functional catalog that describes web resources, improves SEO (search engine optimization), and combines metadata using different standards.

The 2000s and Data Ontology

In the early 2000s, data industry leaders such as Tim Berners-Lee began advocating for what they referred to as “linked data.” Berners-Lee and others promoted the idea that data should be recognized for what it represents – ideas, people, places, events, activities, etc. – and linked in a way that humans can read.

In 2001, the World Wide Web Consortium (W3C) created the Web-Ontology Working Group, and then in 2005 officially transformed it into the OWL Working Group. OWL stands for “web ontology language,” and is a semantic web language designed to communicate a rich and intricate knowledge about different things, batches of things, and the relations between those things.

Ontology is one of the primary building blocks for the semantic web.

2010s and Ontology

During the mid-2010s, representatives for Google, Microsoft, Yahoo, and the Russian search engine Yandex discussed the development of a centralized repository for storing data ontologies and data models. This group decided to use schema.org as the location of their new schema repository. They also decided the storage site would be built organically, providing models and examples for other organizations to work with.

Initially, their data ontology system was seen as a curiosity, almost a toy, and was not taken seriously. However, the idea of having a consistent standardized method for describing and recognizing “things” on the web began to attract a growing number of organizations.

In 2017, Google announced they would begin using schema.org as the foundation for their search engine optimization processes, which attracted even more interest. Ontology models for consumer products, medicine, automobiles, etc., were developed, and the more schema.org was used for modeling, the more other organizations became interested. Schema.org is becoming the standard for locating data and the use of metadata.

2020s and the Future of Data Ontology

In May of 2023, the Systems Engineering Research Center, a University Affiliate of the U.S. Department of Defense and MITRE, hosted a research workshop on Information Models and Ontologies. Sixty-six experts and key stakeholders from federally funded research development centers and academia attended this workshop to discuss different approaches in designing and implementing new ontologies.

The attendees concluded they needed new data ontology models and policies. Additionally, they needed to “find ways to equitably incentivize development and use in critical problem areas that need ontologies.”

The conclusions reached by the 66 experts and key stakeholders strongly suggests the field of data ontology is still developing and evolving. It should be noted that data ontology has become an important part of the data exchange infrastructure. This means that, for the most part, large organizations will control the direction in which data ontologies evolve (unless, of course, some brilliant idea pops into someone’s head, and disrupts the current trends).

ChatGPT has been used in developing ontologies and it is predictable it will be used in the future to develop new forms of data ontologies.