The future of Big Data depends on Smart Data. The self-describing properties of Smart Data are practically necessities for the massive quantities, differentiated data types, and high volumes of Big Data because they facilitate:
- Unstructured and structured data aggregation and analytics: Smart Data supports rapid integration of either unstructured or semi-structured data (as most Big Data is), enabling organizations to expedite analytics and derive composite value from all of their data—even recently acquired Big Data.
- Simplified and accelerated Data Modeling: The complexity and foresight of most Data Modeling jobs are significantly reduced by Smart Data, decreasing time to insight and time to value for Big Data applications.
- Access and Data Governance: Smart Data provides valuable access control aligned with principles of Data Governance for integrated data sources, preserving the order and security that are vital to integration and data access in the long term.
The power of Semantics is inexorably transforming the notion of Big Data into Smart Data. It is the common technology in such diverse applications of Big Data including the Internet of Things, Cognitive Computing, Semantic Graph Databases, Data Lakes, and Artificial Intelligence.
According to Cambridge Semantics Chief Technology Officer Sean Martin, during an Enterprise Data World 2015 Conference panel discussion, which included Dave McComb of Semantic Arts and Idriss Mekrez of MarkLogic, “Smart Data is a natural evolution of Big Data.”
Transition in Logic: From Smart to Dumb
The self-describing nature of Smart Data (courtesy of single sentence descriptions of data elements known as triples) represents an insurgence in the logic applied to data-driven processes. Before the current prevalence of Smart Data, ‘dumb’ or non-Semantic data derived its meaning and contextual relevance to the enterprise via specific applications and the utilization of schema, programs, databases, etc. Outside of those particular applications the data inherently lost its meaning, which made data integration a tiresome chore and greatly accounts for the culture of silos. As Martin asserted, the opposite is true for Smart Data:
“The big change when you move to a set of standards and smarter data is the data starts to contain what’s needed to identify it and explain what it means. And that is independent of any application, which makes it very powerful because in a world where we’ve got an enormous amount of silo information and integration is very expensive, difficult and time consuming. Having information that self identifies and can really carry with it everything you need to do integration and can be used with software that understands those standards but has no preconception of the underlying data model – it’s a huge difference.”
Unstructured/Semi-structured and Structured Data
The prominence of Big Data in Data Management lies in the ability to implement action from real-time analytical insight and consolidate all of one’s data in the process. The latter requires the sort of integration between unstructured and semi-structured data with structured (even legacy) data that Smart Data facilitates at rates commensurate with Big Data speeds. Smart Data can account for the former since it is machine readable and, thanks to Machine Learning techniques, can arrange algorithms around the data itself. Thus, real-time action for data on the fly is possible with burgeoning Big Data applications such as the Internet of Things that automate processes that would simply take too long without Smart Data.
Data Mining: Natural Language Processing
The integration aspect of Smart Data is also integral to accelerating data mining processes across both unstructured and structured data, which allows organizations to discern patterns and ascertain meaning between data sources that they otherwise could not. The utilization of Natural Language Processing plays a valuable role in this aspect of aggregation and creating business value from data. “Natural Language Processing text analytics is allowing us to extract, facts, sentiment, relationships—pretty complicated structures—out of documents and intersect that, marry that up and integrate it with structured data,” Martin remarked.
Data Lakes
At the moment, Data Lakes are one of the most popular means of integrating structured data with unstructured Big Data. These repositories (typically consisting of Hadoop or some other NoSQL option) can house any sort of data for quick access without conventional Data Modeling and schema. Although a number of different vendors are beginning to deliver offerings that can help improve Data Governance for such platforms, Smart Data’s ability to innately provide meaning to data (as they are ingested, regardless of how much data there are) can help organizations govern such repositories without forsaking all of the typical roles, responsibilities, and rules that is required for their long term value. “People who want to build Data Lakes, for example, are going to need to use Smart Data to effectively manage a Data Lake,” Martin commented. “Otherwise you’re going to be left with what’s called a data swamp.”
Access Control, Data Governance, and Metadata
In tandem with Smart Data’s ability to increase the utility of Data Lakes by providing meaning to unstructured or semi-structured Big Data is its capability to help clarify the sort of role-based access that is a pillar of proper Data Governance. By clarifying just what data means and how they therefore relate to the enterprise or certain business units, Smart Data is able to decrease one of the most longstanding negative perceptions of Big Data: that its speed and size are too great for effective management. The key to Smart Data’s efficacy with access control pertains to its relationship to metadata. Martin denoted:
“Where Smart Data helps you handle Big Data is by providing both a description of what all that data means—so metadata describing specifically what the data defines itself as—but also any amount of metadata that might be important to the context of that information, as well as metadata used to link that information if it’s coming from multiple files.”
Martin also intimated that in situations in which organizations have integrated data that was previously in silos, metadata can be used to restrict access and issue regulatory conformity: “Solving those kind of problems is going to be one of the next big steps for Smart Data. Figuring out how to basically provide the metadata you need to do the access control.”
Smart Data Modeling: Preferred Analytics
There is a degree of agility and flexibility in the sort of modeling required for Smart Data that vastly exceeds that of non-Semantic data. In the case of the latter, Data Modelers have to determine in advance of the model’s creation each and every question that the model will answer, and how it is going to relate to specific facets of known data types. Such a process is not only arduous and type consuming, but makes it difficult to add new sources or to change the requirements for a model. According to Martin, modeling for Smart Data is not only less rigid, but yields significant advantages—especially when incorporating additional sources of Big Data.
“In the case of Smart Data you simply create a model that represents the data as truly as possible in its real form, and then you map your real data onto that model. And then you can ask a question—any type of question—mapped across the data. The more that you expose, the more questions you can ask. And you don’t know all the questions you’re going to ask because the deal here is pretty much answer anything that you like.”
Transforming Transactions: The Internet of Things
Aside from Cognitive Computing and Artificial Intelligence, the foreseeable final frontier for Big Data revolves about the Internet of Things and the various analytics opportunities and Big Data applications all of that real-time continuous connectivity is going to provide. With Semantic technology providing the very means by which all of those devices can communicate with one another and understand the meaning of all of these different types of data, Smart Data will be at the core of this Big Data application which Martin believes will ultimately influence the way that transactional data is facilitated in the era of Big Data. In conjunction with its advantages for analytics, application development, data integration, and Big Data Governance, Smart Data’s reconfiguring of transactional data will all but solidify the fact that Big Data will surely evolve into Smart Data. Martin reasoned:
“It’s just a question of the maturity of stacks that can handle Smart Data. Once they are [in place] and once the performances are there and people start having new ideas with what they can do with them I think we’re going to see a whole new class of transactions and they’re going to be context oriented.”