You are here:  Home  >  Data Blogs | Information From Enterprise Leaders  >  Current Article

The Challenges of Master Data Management in the Internet of Things

By   /  December 3, 2014  /  1 Comment

by James Kobielus

Master data management (MDM) will come to the Internet of Things (IoT), if it hasn’t already.

MDM refers to governance over your core enterprise data of record. Does IoT data fit that description? Well, if you’ve implemented IoT in an industrial “sensor Internet” that keeps track of the identities, configurations, behaviors, and status of distributed manufacturing and logistics assets, then you’re probably keeping master databases for all of that. You have every operational need to keep authoritative, consolidated, current, and internally consistent systems of record on all these assets. You do this in order to have certainty over where it all is and how it’s performing at every point in time.

But as IoT spreads to every corner of the enterprise and throughout your value chain, you will have less and less certainty, and far messier asset-management databases, to keep you current on what exactly is out there. IoT MDM will be playing a perennial catch-up game to discover what “things” (i.e., connected sensors, actuators, and embedded intelligence) have been installed in the various buildings, equipment, vehicles, and other assets within your extended enterprise. Many of these “things” will be undocumented, ad-hoc (and, possibly, rogue) local retrofits to existing facilities, often without any central coordination.

From a centralized IoT MDM perspective, you’ll become aware of these undocumented things by their consequences. In other words, you’ll know them by the fact that suddenly, some mysterious new device in one of your overseas plants is starting to generate machine data that is being transmitted to your warehouses, automatically triggering business processes to arrange pickups, deliveries, and so forth. And you’ll know that other mysterious messages are coming from other undocumented things in that same facility, but you won’t have a clue whether their sources are separate physical machines or simply subcomponents of the same machine. And you won’t know for sure if the processes described in their messages are all pertaining to the same machine or process within that facility, or to entirely separate entities. You’ll be clueless.

Ideally, you’d want an IoT MDM environment that does what product data management (PDM) MDM applications have been doing for years. In other words, you’d need a common matching and search engine that employs advanced statistical techniques to automatically resolve device identity and attribute issues. Essentially, PDM systems do this by extracting data from multiple source systems (such as IoT endpoints) and using this as the basis for data profiling, resolution, matching, reconciliation, de-duplication, correction, and enhancement, prior to loading the transformed data into configuration management databases and other systems of record.

PDM systems can use either deterministic or probabilistic algorithms to accomplish this feat: deterministic when the rules for matching disparate items are clear-cut and unambiguous (enabling 100 percent accuracy) and probabilistic when the rules are fuzzier and the possibility of spurious matches (i.e., false negative or false positives) exists. In the burgeoning IoT arena, the sheer diversity of vendors, devices, specifications, and interfaces–and the lack of universal standards–will make probabilistic device-matching, based on the myriad of commercial machine-data output formats, the most feasible PDM approach for IoT for many years to come.

Fortunately for enterprise IT professionals, you can begin to support IoT PDM requirements using established MDM best practices and mature MDM tools. Here, from 2007, is a good discussion of the differences between deterministic and probabilistic matching.

For IoT MDM, the keys are vendors’ mature support for unstructured data sources, disparate machine-data formats, adaptive machine-learning algorithms, and contextual identity resolution. Just as important, these solutions will require petabyte scaling and in-Hadoop analytics features to handle the staggering volume, velocity, and variety of the IoT MDM workloads that are sure to come online in enterprise clouds within the next several years.

About the author

James Kobielus, Wikibon, Lead Analyst Jim is Wikibon's Lead Analyst for Data Science, Deep Learning, and Application Development. Previously, Jim was IBM's data science evangelist. He managed IBM's thought leadership, social and influencer marketing programs targeted at developers of big data analytics, machine learning, and cognitive computing applications. Prior to his 5-year stint at IBM, Jim was an analyst at Forrester Research, Current Analysis, and the Burton Group. He is also a prolific blogger, a popular speaker, and a familiar face from his many appearances as an expert on theCUBE and at industry events.

  • Ajay Raina

    Love this article. I would absolutely think that some form of device master which can be derived from Product master data pattern and extended for matching, merging and de-duplication across other forms of data repositories (NoSQL, time graph, HBase, HDFS) for different types of usage.

You might also like...

Tackling the Challenge of Transforming Unstructured Data into Actionable Intelligence

Read More →