You are here:  Home  >  Data Education  >  Big Data News, Articles, & Education  >  Big Data Articles  >  Current Article

Master Data Management Strategies for Big Data

By   /  February 13, 2014  /  No Comments

big data mdm x300by Jelani Harper

In addition to improving the Return On Investment (ROI) of Big Data initiatives, there are several reasons why it greatly behooves organizations to enhance their Master Data Management (MDM) with Big Data:

  • More Effective Analytics: Linking an MDM system to Big Data can provide the basic framework for performing analytics, as the former offers essential information about customers or products that reduces the scope of analytics and yields greater insight. Otherwise, users run the risk of randomly querying Big Data in search of the proverbial needle in the haystack.
  • Enforced Governance: The Metadata, cleansing, and data quality standards of MDM hubs (which often house the enterprise’s most reliable and important data) are ideal for identifying which Big Data is appropriate and which should be discarded. Better governance considerably helps in the “taming” of unstructured or semi-structured data from lesser known public sources.
  • Expanding the Truth: While Master Data is generally considered the closest data organizations have towards denoting the elusive “single” version of the truth, supplementing what is typically relational data with unstructured data provides a more comprehensive overview of key facets of customer (and product) behavior, which directly assists in generating future business value.

Ultimately, Big Data’s impact on MDM provides the opportunity for the latter to move beyond its static role as a data repository. It serves as a point to leverage the governance and quality benefits associated with MDM throughout additional segments of the enterprise – but only if organizations implement the correct strategies to yield these benefits.

Driving Big Data Analytics

Organizations now have Big Data and Master Data, yet what is the best way to utilize their interactions together? It appears more viable for organizations to utilize the latter to inform the analytics process of the former, than to expressly use Big Data to add to a Master Data repository – although the first option will inevitably enhance an enterprise’s Master Data assets. Basing queries on Master Data tailors results so that they are aligned with business objectives. In this way MDM functions as a driver for Big Data due to the degree of specificity regarding customers and products towards which it can target Big Data analysis.

Conversely, Business Intelligence (BI) tools can parse Big Data according to information found in MDM hubs to identify points of relevance. BI search capabilities can become significantly enhanced by automating the application of Metadata related to germane Master Data, which can take place either at the time a query is issued or at the point that data is stored. Both methods enable users to expand the utility of Master Data by aiding in the analytics of Big Data, expediting time to insight, and subsequently enriching MDM hubs.

Moreover, utilizing semantics technologies and text analytics assists in the integration of MDM and Big Data. According to an Oracle posting:

“Businesses that have embraced MDM to get a single, enriched and unified view of Master data by resolving semantic discrepancies and augmenting the explicit master data information from within the enterprise with implicit data from outside the enterprise like social profiles will have a leg up in embracing Big Data solutions.”

Integration Options

The most eminent concern for integrating Master Data with Big Data sources is doing so in a way in which the quality and governance standards of the former augment the latter, as opposed to the lack of such standards in Big Data exacerbating quality issues in diligent MDM systems. The most effective means of achieving this objective include:

  • Data Virtualization: Virtualization layers are perfect for abstracting data between sources without actually moving the physical location of the data. They offer organizations integrated views of different data sources and querying in close to real time. Business rules can be implemented at this layer to ensure data quality.
  • Contemporary MDM Platforms: A recent Gartner report states, “By 2017, 35% of MDM software sales prospects will purchase based on candidate vendors’ solutions for linking structured Master Data to Big Data sources.” There is no shortage of options that vendors are providing for doing so including devising software to conform Big Data to MDM characteristics, implementing layers specifically for business requirements and analytics, and provisioning links to common Big Data file systems and warehouses, as well as linking content to metadata dictionaries for indexing.
  • Hadoop: File systems such as Hadoop and certain NoSQL offerings for Big Data allow SQL access so that users can query and interact with data in a language more native to that of most Master Data systems, which enables greater control and conformity of Big Data from a governance perspective. It is also possible to build tailored apps in these systems that retrieve both structured and unstructured data for certain domains, such as customers.


Effective Data Governance has always played a vital role in Master Data and MDM systems, as this data is defined by and adheres to canonical business rules with clear denotations regarding Metadata, ownership, authority, and quality. Common inconsistencies which may appear in other systems (redundancies and inaccuracies) are largely accounted for and removed in MDM hubs, so that there is ideally an innate sense of trust for their data.

The shift towards the incorporation of Big Data into business and operational processes requires that Master Data’s role evolves from one as merely a collection of the most useful data to a tool to leverage model governance standards for unstructured data. The value MDM produces for Big Data initiatives goes beyond analytics to actually shaping the level of reliability of Big Data which, at point of origin, is intrinsically less credible than data stemming from proprietary sources.

The pairing of Master Data with Big Data enables enterprises to contextualize the latter within the scope of the former so that the governance and quality standards of Master Data become the framework for the appropriation of Big Data. Big Data which are readily incorporated into the policies and definitions existent in MDM solutions can add value more readily than those which aren’t.

The Evolution of Master Data Management

Collectively, these individual MDM strategies for Big Data indicate a transformation of the applicability of MDM, which has historically focused on single domains for very specific segments of the enterprise. The fact that it can refine analytics efforts for disparate, copious amounts of data at high volumes may serve as a driver for Big Data initiatives. Various integration options enable an aggregate of proprietorial, structured data and non-proprietorial, unstructured data to form a more comprehensive version of ‘the truth’. Master Data’s exemplary governance standards serve as a valuable starting point to enacting similar conformity for Big Data.

However, the governance effects of MDM for Big Data ultimately serve as a means of governing the relationships between various points of data, regardless of their origins. By relating all other data (Big Data or otherwise) to Master Data and its governance, organizations can more readily discern their data’s value and reinforce data integrity in what a Forrester post refers to as a multi-dimensional Master Data model, exceeding that simply related to a customer or product domain in which organizations:

“…move from a two-dimensional model to a multidimensional model of master data. Master data is all about the data model both in terms of relationships and hierarchies and how data elements are combined. Master data, metadata and references data converge under an MDM umbrella allowing for unlimited combinations determined by categories, definitions, and context.”

You might also like...

What’s the Difference Between a Data Warehouse and Data Lake?

Read More →
We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept