Semantics technologies have emerged as one of the most influential means of Data Management. A March report from Gartner identified semantics technologies as one of the top trends impacting information infrastructure in 2013.
There is a utilitarian aspect of Semantics technologies that can bring value to virtually any organization, regardless of its industry. It is ideal as a means of federating data and is well suited for use with Big Data’s variability due to its reduction of data to a single format regardless of structure. It also facilitates agility and helps to simplify data for business users. Recent developments in Semantics have made it much more user-friendly, enabling executives and worker bees to identify business value with it expediently.
According to Information Management Solutions Consultant’s Chris Moran, one of four participants in a round table discussion on “Integrating Semantic Technology with Enterprise Information Management” at Enterprise Data World 2013 Conference & Expo,
“In the past 30 years or so, the meaning of data has been in the structure that we store it in. Semantics technology is a shift away from that way of thinking about information. Instead, we’re putting the meaning into the data itself, so that we’re no longer bound by structure. Semantic technology is any of a collection of tools and approaches and design systems that enable us to make the data itself contain its own meaning.”
The three primary technologies associated with Semantics include Resource Description Framework (RDF), a standard model for describing data, Simple Protocol and RDF Query Language (SPARQL), RDF’s principle querying language that also functions as a protocol, and Ontology Web Language (OWL), an ontology language that describes data’s attributes and relationships.
The combination allows for virtually all data to be expressed in the form of a triple, regardless if it’s numeric, video, text, or any other format. In its original inception as a means of facilitating interactivity on the World Wide Web, a triple is explained by the World Wide Web Consortium (W3C) as “the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link.” For Enterprise Information Management, however, a triple is a descriptor of data that provides a subject, predicate, and object.
Solutions Architect Peter Lawrence of TopQuadrant implied that the capacity for utilizing myriad sources and forms of data and reducing them all to triples so that users can readily analyze them is one of semantics’ primary benefits:
“Reducing information to semantics structure and to just a simple triple is an ideal way of merging lots and lots of data streams or threads of information. It’s like taking information and decomposing it down to its atomic level, and then you can recombine it back together.”
Storage and Agility
Although Semantics perhaps works best in the world of data federation and Enterprise Content Management, it provides tangible benefits for storing data. Conventionally, Semantics data is housed in Semantics stores. However, the degree of flexibility Semantics offers by enabling all data to be reduced to a triple means that enterprises can use the technology with their data warehousing means of choice. Relational databases can convert Semantics data into three-columned tables to account for the information provided in triples; such tables can be expressed as graphs for graphical data bases. This aspect of Semantics results in effective data aggregation, which simplifies the storage process and assists in the arrangement of data in a way that is beneficial to the actual business-end users as opposed to IT teams.
Ultimately, the utilitarian attributes of Semantics technology manifest themselves in the facilitation of agile approaches. The reduction of data to triples enables users to make quick changes. Thus, they will not necessarily have to anticipate problems regarding Data Management that can arise from mergers, acquisitions, and other unforeseen circumstances. By having data stored in a format that exists outside of the structure of a particular system, users can readily change it and move it to suit their present and future needs. Additionally, the fact that data is represented in the form that people think of it as (courtesy of triples) aids in a fairly intuitive approach to Semantics that is critical for agile, on-the-fly moderations, or even for automated ones, as the aforementioned Gartner report suggests:
“Semantic technologies extract meaning from data, ranging from quantitative data and text, to video, voice and images. Many of these techniques have existed for years and are based on advanced statistics, data mining, machine learning and knowledge management…big data…requires semantic technology that makes sense out of data for humans, or automates decisions.”
The Art of Expression
Despite the fact that Semantics technology reduces data to simple triples, it can still provide a degree of expressiveness about data that is vital to informed decision-making. The most complicated sets of data – such as sensor data in Big Data volumes – can be expressed in a series of triples. More importantly, the expressiveness of Semantics largely revolves about how detailed and relevant the ontologies for it are. It is crucial to involve business-users in the process of expressing rules that will assist in their processes, in much the same way that Data Governance programs want to involve them in creating definitions for Metadata. The difference, however, is that Metadata is information regarding data’s structure kept in a separate repository, whereas Semantics brings those definitions (or ontologies) closer to the data itself by defining the data by those descriptions and relationships.
It may also be of benefit to involve a formal ontologist in the creation of ontology after business analysts have given their input. As a professional in this specific area of Semantics, ontologists can take the definitions provided by business users and implement them in a way so that ontologies are optimized to provide the most accurate form of description (or modeling) about data. The potential drawback of this approach is that the input of the ontologist provides greater distance between the business users and their definitions of how the data appears. One of the reasons why Semantics is gaining ground is because of its simplicity of use – hich becomes more complicated (ideally in a valuable way) with the intervention of another scientific layer between business and the data.
Still, there is no disputing the fact that properly created ontologies, whether they involve a formal ontologist or simply the input from business users, functions as a means of creating the structure into which data is formed. The result is similar to, yet significantly more expressive, than modeling data in conventional relational databases.
A Long Road
Semantics technologies were initially designed by Sir Timothy Berners-Lee (the creator of the World Wide Web) to provide a more intelligent way of linking Web pages to issue information more effectively. Since that time the technology has grown from conventional text analytics without standards to influence a number of different aspects of Enterprise Information Management. It provides an agile means of federating, aggregating and storing data in a way that enables users to intuitively understand data regardless of structure. Its role in the future of information management was summed up by Sean Martin of Cambridge Semantics:
“There’s something fundamentally happening at this moment in computing; for the first time we have enough horsepower in our cheap RAM, multicore CPUs, fast internet connections etc. that we can arrange information for the benefit and convenience of the people. We can make the data smarter so we can make the applications less smart. We’ve never been in this situation before. The orders of magnitude and productivity that we’re starting to witness as we deploy Semantics systems are quite breathtaking.”