Data Harmonization is an approach to Data Quality that is meant to improve the governance and usefulness of data across the enterprise. How does it do that? And how should a company go about implementing a Data Harmonization strategy?
To answer these questions, DATAVERSITY® spoke with Anil Kaul, co-founder and CEO of Absolutdata. Mr. Kaul was named one of the ten most influential Analytics Leaders in India. He has over two decades of experience in Data Analytics, market research, and management consulting. A respected writer and speaker, Mr. Kaul has a PhD and a Masters of Marketing from Cornell University.
An edited version of the conversation follows.
DATAVERSITY (DV): How would you define Data Harmonization?
Anil Kaul: Simply put, Data Harmonization is all about creating a single source of truth. It does this by taking data from disparate sources, clearing away any misleading or inaccurate items, and presenting it as a whole. This means you get a single window view of everything and anything that supports ongoing decision-making, including financial information and business performance. Data is coming at you from different sources, but once it’s harmonized, it’s been cleaned, sorted, and aggregated to provide a complete picture.
DV: According to Informatica’s definition of Data Harmonization, Machine Learning is a key part of the process. How do smart data tools like Machine Learning and Artificial Intelligence play into the process of Data Harmonization?
Anil Kaul: Machine Learning is a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from data, Machine Learning allows computers to find hidden insights without being explicitly programmed where to look.
Growing volumes and varieties of available data, cheaper computational processing, and more powerful and affordable data storage and mining capabilities mean it’s possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results – even on a very large scale. And by building precise models, an organization has a better chance of identifying profitable opportunities – or avoiding unknown risks. This is supported by Machine Learning tools and AI.
There is this data wrangling problem which is growing as different types of unstructured data or data in varying formats are pouring in from sensors, online, and traditional databases. All these data must be cleaned up and organized before Data Analytics tools can be applied. This is where automation tools come into play. Automation and AI can help organize data from a variety of sources, then present the organized data in charts and graphs.
Artificial Intelligence can supplement to make it easier to prepare and harmonize data, thereby speeding mainstream adoption of big data techniques. Indeed, the growing diversity of data coming from emerging networked sources like the Internet of Things is fueling demand for more and better automation tools.
DV: Is Data Harmonization just a dressed-up term for Master Data Management? What are the key differences?
Anil Kaul: Master Data Management (MDM) is a discipline that focuses on the management of Reference or Master Data that is shared by several disparate IT systems and groups. MDM provides access to an organization´s central data repository. It also tackles data issues by concentrating on the business processes, data quality, and the standardization and integration of information systems.
Data Harmonization goes one step ahead and involves cleaning of data to remove any inconsistencies and inaccuracies from various sources of data. It tries to create “harmony” between different sources of data to create a complete, cohesive picture. The Data Harmonization process is like joining the pieces of puzzle to make sense, wherein the different data elements and variables are identified, cleansed, and processed together to create a data store which facilitates decision making.
DV: What are the core benefits of successful Data Harmonization?
Anil Kaul: Data Harmonization enhances the quality and utility of business data by making it relevant to the needs. Data Harmonization also makes it possible for business users to transform data and create new data analyses and visualizations without IT involvement. You don’t have to wonder if you’re getting the whole picture. You can completely rely on the truth of your data and make stronger decisions.
At its simplest, Data Harmonization enhances the quality and utility of business data, thus helping in quick and inexpensive deployment of new and advanced techniques like Machine Learning, Artificial Intelligence, and Internet of Things.
When you have a single source of truth that is updated either regularly or in real time, there is no need to spend time verifying, re-hashing, and tracking down multiple sources of data. The information is there, and you can decide upon it. This will make your company more agile and responsive to market changes. Data Harmonization significantly decreases the time to create and access Business Intelligence insights while also lowering the total cost of data analysis.
Suppose you start using harmonized data with one team. As your system evolves over time and accumulates additional relevant data inputs, it becomes relatively quick to replicate the process in other business areas. And as usage and adoption increases, the system becomes a robust knowledge repository.
DV: How should the Data Harmonization process begin? What are the basic steps of proper implementation?
Anil Kaul: The Data Harmonization process begins with defining organizational goals and objectives. Harmonization and research protocols are established that support these objectives. An equipped architecture for the IT systems and elements required is designed to start the data integration and harmonization process. Basic steps of implementation include:
- Step 1: Identify the relevant sources of micro data for collection and the acquiring of data to form data sets.
- Step 2: Clean and harmonize. The data cleaning process consists of identifying incorrect, inaccurate, or inconsistent parts of the data and modifying them. This is done in order to improve Data Quality and produce a clean, uniform, and consistent data set for harmonization.
- Step 3: A quality check is run on the data to make sure it has maintained an acceptable level of integrity and validity. Check for duplicated data entry, inconsistent, or inapplicable data.
- Step 4: Variables are identified and selected for harmonization. This can be quite tricky, since variables from multiple sources are rarely uniform. A balance has to be struck between practicality (finding information that is similar and works together) and purity (information that corresponds exactly).
- Step 5: The data is processed, converted to a common format where needed, and pooled. Now, every part of an organization can access the same up-to-date data. It can be filtered and presented to suit each department’s needs.
After passing its final exams, the data can be stored for use as needed. Data that has been harmonized is not static; it can be updated, either periodically or in real time to ensure that it is of use for each problem.
DV: What advice would you give to a company exploring the possibilities of Data Harmonization?
Anil Kaul: Data Harmonization is the future because it complements and supports efficient data processing and decision making to ensure accuracy and reliability. In the coming years, Data Harmonization will become a requisite to business efficiency and success.
Data Harmonization is more than a gap filler. It helps to seamlessly tie the various data elements and variables into one data set which can be used as per the function. Disparate and unrelated data takes time to get processed and thus delays the decision-making activity, lacking precision. But now, every organization can stay ahead of the competition by being distinguished and agile through data management and Data Harmonization by taking relevant and optimized decisions.
The value of Data Harmonization is easy to see, but the path to implementation is much less clear. Mr. Kaul’s five steps lay a strong foundation for any business interested in pursuing the benefits of Data Harmonization, benefits that include faster and better Business Intelligence insights. But every business will have a unique method for taking each step and bringing Data Harmonization to fruition. That said, any method aimed at discovering a “single source of truth” is surely worth exploring.
Photo Credit: NPFire/Shutterstock.com