Revamping Master Data Management with Graphs

By on

master-data-management-with-graph-dbs-x300by Jelani Harper

One can speak volumes about the impact of graph databases on Master Data Management (MDM) systems. Nonetheless, Franz CEO Jans Aasman has summed up the advantages of a Semantics approach to Big Data that is visually represented in MDM via graphs in a single word.

“It’s about the complexity in your data,” Aasman said. “If there are networks in your data or terminology complexity or other kinds of complexity, then it’s far more straightforward to use a graph database than a relational database.”

Graph databases are able to account for MDM complexity and inherently reduce it due to their fundamental strength of representing relationships in a Semantic way that easily links any number of products, customers, and attributes in these hubs.

Visual Representation

There is a considerable degree of complexity in MDM systems in a business climate impacted by Big Data, especially for systems centered on customer domains. Numerous external sources (including social media and various forms of sentiment analyses) considerably complicate key relationships for products and customers. The deployment of graph databases, such as Franz’s AllegroGraph, with MDM can simplify these relationships by visually representing the way that different categories of an organization’s core business—based on ontologies—relate to one another. According to Aasman: “You can look at your screen and see, literally, some data on your screen and you can say well, this is the pattern I want to know. You click on each of these things and turn it to query.”

Enhanced Query Capabilities

The usage of various Semantic elements include ontologies (which Aasman categorized as “formal, self-describing descriptions of schemas”) and their components. These include classes, sub-class relationships, and predicates. Predicates are different attributes associated with an ontology; classes and predicates can contain copious amounts of Metadata. Ontologies are used to describe objects between data of any type or variety. Those objects are visually represented in a graph database, and are used to facilitate the query process. “You can actually look at them in a graph,” Aasman said. “You can see each element.”

Most importantly, this manner of issuing queries based on graphic representations is one of the fundamental ways that graph databases can simplify what can be potentially complicated queries pertaining to MDM. “We can do queries that are almost impossible to do in SQL,” Aasman said. “It’s about the complexity.” It’s also about the ease of use that such complexity is readily deconstructed for: timely queries in processes that would take significantly longer in environments that do not leverage Semantic graphs.


Another way in which graph databases can improve MDM systems applies to those that are stitched together with various technologies and tools throughout the enterprise. For instance, objects that are created in a graph can be mapped to external entities to help facilitate action and to further add to the degree of complexity involved in a graph query. “You can have a mapping procedure that links to an enterprise data warehouse and declaratively specify how to transform ETL data from a data warehouse into your graph model,” Aasman noted. Such a mapping process involves creating declarative objects to specify procedures based on information found in the graph database and possibly relating them to a data warehouse. Thus, organizations will have a graph of the schema of their MDM, while “the mapping from the values in the graph are done by another graph object that takes you to the enterprise data warehouse or another database,” Aasman said. The result is the ability to derive action from MDM based on graph databases.

Natural Language Processing

One of the more interesting aspects about utilizing graph databases with MDM is the role that Natural Language Processing (NLP) can play in the query process. The visual querying framework that semantic graphs facilitate was described by Aasman as “even simpler than natural language”, especially because the former method does not involve code. Still, there are ways in which NLP can assist with the querying process for MDM systems augmented by graph databases. The most salient of these are when NLP is involved with certain definitions and descriptions of terms that are referred to with multiple spellings, nick names, and perhaps even slang.

One of the most cogent examples of this fact is found in a use case in which Franz combined with Montefiore Medical Center to create a healthcare platform with instantaneous querying capabilities of vastly heterogeneous sources. When performing highly specific queries involving facets of distinct medical conditions such as asthma and peanut allergies, it was necessary to both link to a repository that contained an immense amount of definitions of terms and link to the various objects involved in the query. Aasman commented that:

“Peanut allergies has so many different spellings and there’s so many different types of asthma. The reason why we even have a chance of doing this is we have a combined terminology system with about 300 million preferred labels for alternative spellings for certain concepts. We had the relationships between concepts like this is a higher level of this word, and this is an even higher level. When you do Natural Language you have to look at each word and you have to kind of come to a generic concept.”

Electronics Use Case

Another example in which the efficacy of graph databases with MDM was illustrated involves Franz’s work with a well-known company in the electronics industry. The company was looking to pool data from a variety of different systems including ERP, respective databases for inventory, customer equipment settings, and troubled equipment. Worse, Aasman observed, “They were storing about 10 times the inventory than they should [have], because they just couldn’t do their predictions well enough.”  All these databases were silos. To create a MDM solution, Franz had to create a MDM schema based on schema for each of the databases, as well as use graphs to define relationships between objects in each database. Aasman mentioned:

“You can imagine that you almost have a schema space on top of all your relational databases, that now allow you to walk one graph from another graph. When I say graph I mean local schema for a local database. They’re all linked together now.”

Similar to the solution in the Montefiore Medical Center use case, the solution with the electronics customer also involved a terminology repository for different spellings and references to what amounted to the same object. Without such a repository, a MDM system would not work because there would be too much ambiguity in the underlying data. “So you needed two graphs [in the MDM system]: a schema graph and a terminology graph. Then you can do schema and then you do queries that combine everything together,” Aasman said.

Going Forward

The most distinct advantage of augmenting MDM systems with graph databases is reducing the complexity of integrating and querying vastly different source data. The visual representations of data objects that graphs provide play a considerable role in simplifying queries of different data types, enabling users to discern patterns and relationships that otherwise might not be discernible. The benefits of graph databases with MDM also include the ability to map to other data sources and to leverage NLP for further clarification purposes.

Leave a Reply