The growth and usage of data have effectively ushered in a new era in Data Management, especially in Data Governance. The new era is especially demonstrated by such trends and technologies as Big Data, mobile computing, the Cloud, social media, and machine generated sensor data.
According to TopQuadrant co-founder, CMO and VP of Professional Services Robert Coyne, accounting for that deluge of data will require much more than a human understanding of it:
“We need to get an understanding of things and maintain that understanding of things. And not just between human beings, but between human beings and the systems in which we all use and interact with.”
The application of Smart Data—standards based Semantics—is central to providing that understanding between the machines that actually generate data and those that manage it. Its incorporation into numerous technologies such as Cognitive Computing, predictive analytics, Natural Language Processing, and others is indicative of this fact.
However, it can also provide utility to the enterprise at a much more granular level and significantly improve basic Data Governance processes that reduce costs, proportion resources better, and provide a meaning of data and their attributes that is not only automated but largely self-perpetuating.
According to Coyne, the governance of Reference Data (which is closely linked to Metadata Management and Governance) makes for an ideal starting point for Data Governance since:
“Reference Data is complicated enough and to get Data Governance in place for it could help you to understand how to define roles and processes around that so you could see how it could be extended and enhanced for Data Governance in general.”
In this respect, it is helpful that Reference Data is closely linked to Metadata. Reference Data are those which make other data meaningful and which provide context to other data “in terms of the larger world,” Coyne stated. The classic example of Reference Data is country codes, but there are numerous enterprise examples pertaining to Master Data Management (MDM) systems, and various classifications of customers and products.
Coyne referenced a recent survey conducted by TopQuadrant to assess how various organizations were governing their Reference Data. The most advanced ones utilized MDM systems with plug-ins for Reference Data that do not encompass all of the requirements necessary for these types of data, while most employed either spreadsheets or in-house systems that were frequently silos across individual business units.
As a result, common problems associated with Data Governance for Reference Data include duplicate data (in which costs pertain to material resources and human resources used to manage them), poor data quality, redundancies, and issues with updating Reference Data. A white paper produced by TopQuadrant estimates that for large enterprises, silo approaches to managing Reference Data are akin to paying for a pair of full-time employees. In highly regulated financial or health care industries, Reference Data that have not been sufficiently updated in all facets of the enterprise could result in hefty fines. Coyne outlined the typical situation enterprises face when applying their individual use case solutions for Reference Data management:
“If you have a Data Steward and he’s keeping track of Reference Data, but then he realizes that he needs an additional field or two, in other kinds of environments, he might have to go to IT and say add another column. Well, that cycle might take six months. Whereas in this environment because it’s semantics-based and model-driven, the steward can go over to the business model, add a property himself, come back to his Reference Data model screen and the field shows up and he has the information. It could take half an hour, or less.”
Semantics has partly been termed Smart Data due to its propensity to propagate meaning in machine processes since these data are self-describing—particularly when linked with relevant Metadata. Part of the utility in applying this technology to Data Governance and to that of Reference Data in particular lies in the fact that by simply updating the actual data themselves, one is also able to make automatic updates everywhere, simultaneously, in any number of different systems that utilize them. Thus, there is a reduction of reliance on writing code in Reference Data governance platforms that utilize Smart Data such as TopQuadrant’s recently released TopBraid Reference Data Manager (which was unveiled at the Enterprise Data World 2015 Conference). Such a tool effectively enables fewer governance personnel to do holistically, throughout the entire enterprise, what previously required long term, intensive collaborations between governance personnel and IT.
Subsequently, such governance solutions are able to facilitate a degree of interactivity between data and their systems that are virtually unparalleled in environments that are not model-driven, semantics-based ones. These platforms can automatically link Reference Data with definitions provided in business glossaries as well as link different sets of Reference Data to each other. Moreover, they have an intrinsic relationship with Metadata. Reference Data actually hinges on a number of different facets of Metadata including those that specify what the Reference Data actually mean, where they come from, when they were last changed, when they need to be updated, and several additional facets depending on the type of Reference Data.
“All of that information is easy in a semantic environment to add,” Coyne revealed. “One of the key drivers in a high level sense of this solution is being able to take Reference Data sets and code lists and surround them with Metadata so they can be better understood and better tracked.”
Coyne also noted than in some instances, more than 50 percent of an enterprise’s data encompasses what amounts to Reference Data. As such, a standards-based semantic approach to governing that data could greatly aid in ensuring regulatory compliance, particularly in industries such as healthcare or finance in which the number of regulations is increasing substantially. The automatic updating aspect of model-driven, standards-based Reference Data can help ease that process which, in the finance industry in particular, is certainly formidable. Coyne observed:
“The financial industry is facing a very long, unprecedented upgrade to their systems which is going to take many, many years to get on top of sufficiently. There’s lots of risk analysis dependencies and governance dependencies and financial risk and loss. They need much better governance and interoperability between different sources of data.”
More than Governance
The utility of a semantics approach to link different types of data and entire systems of data ultimately extends well beyond the needs of Data Governance. Coyne mentioned that there was a significant amount of discussion at EDW 2015 regarding financial industry standards and what was an attempt to determine uniform terminology for a Financial Industry Business Ontology (FIBO) standard. Additionally, a standards-based approach can play a pivotal role in linking legacy systems to some of the newer platforms for Data Management—without the reams of documentation regarding code that was created long ago by personnel no longer with an organization. Coyne stated:
“This technology—from my point of view—is so revolutionary because you have a model that from the very beginning is machine processed. It’s an ontology and you can build your understanding of requirements and of what the ontology needs to represent. But then it becomes part of the runtime system and it’s queryable so that even somebody that wasn’t part of that can later send queries to the model and say, ‘what do you represent’, and get it out. To me, that’s going to be absolutely revolutionary of infrastructures.”
Moreover, this model-driven approach can ultimately help to account for the exponential amount of growth of Big Data and data in general in the coming years. With the type of interoperable interconnectivity that the Internet of Things either will or already is facilitating (depending on whom you talk to), Semantics technology may be the enterprise’s best—and only—hope to account for the salient issues of governance, security, and privacy that pertaining to this data deluge. Coyne opined:
“No matter how much data you have, you have to wonder how you’re going to get value from it and protect it if you don’t understand the meaning of it. Where we’re coming from is semantic standards ways of enabling machine readable meaning, which is something that’s vital. Now exactly how that plays out in this gigantic space of Big Data is not entirely clear. But if you can’t even manage regular data, on an ongoing basis, [such as] small code lists, how can you [manage Big Data sets]?”