Data Quality in the Enterprise

By on

data qualityThe target of many conference talks, backroom jokes, and late nights for data professionals in the Data Management industry is Data Quality. Data technologies and data assets have no meaning without reliable Data Quality, and yet there still is not enough attention given to data acquisition, data storage, and data preparation methodologies that are substance of Data Quality practices.

Traditionally, Data Management gurus have been engrossed in refining the Data Analysis and Reporting platforms (among others), and Data Quality has been easily overlooked. In the recent years, low costs data storage and data processing hardware, advanced Data Analytics platforms, Artificial Intelligence, Internet of Things (IoT), and many others have all jointly contributed to the necessity of leveraging an organization’s data assets through better Data Governance and Data Quality.

Data Scientists have realized that even with the best of hardware and software capabilities, the brightest minds, and Advanced Analytics platforms, business outcomes are not improving enough because of poor data. Therefore, the worldwide data community has turned its attention to Data Governance (DG), Data Quality (DQ), and Data Stewardship (DS) with ever greater significance due to the need for stronger Data Management foundations prior to implementing advanced data platforms.

The DATAVERSITY® Webinar on Data Quality the importance of DQ in today’s business climate. The Forbes blog post titled The Importance of Data Quality Good, Bad, or Ugly confirms that in a data engulfed business culture, the only savior is Data Governance. In a Report called the “The Data Differentiator: How Improving Data Quality Improves Business,” Forbes and Pitney Bowes jointly examine the role of DQ in regulated and compliance-centric business sectors like finance, banking, drug manufacturing, or healthcare.

In any of these regulated businesses, poor Data Quality can lead to a loss of business licenses, lawsuits, and hefty fines. Thus, today, even a huge business can suddenly fall to its knees if it fails to meet the DQ or DG requirements. Poor data can contain visible errors, incorrect facts, misleading statements, or improper references to products or services that do not exist.

The Importance of Clean Data

As most modern businesses rely on timely information, especially competitive intelligence, the difference between high-quality and low-quality data means the difference between survival and death. With the advancement of Big Data and allied technologies like Hadoop, the Cloud, and IoT, it is now possible for even mid-sized and smaller businesses to collect, store, and prepare clean data for competitive analytics. In the article titled Why Data Quality Is of Utmost Importance to Information-Centric Organizations, you will notice that Data Security and Privacy are also very important considerations in the overall Data Management strategies of an organization to ensure clean data.

In Take Enterprise Data Quality for Granted – at Your Peril, data experts unanimously agree that Data Governance is of the utmost importance to small data and Big Data environments alike. The current thinking is that data professionals must look beyond the age-old “Enterprise Data Warehouse” (EDW) and “extract, transform and load” (ETL) approaches of Data Analysis. Here is an article from Oracle that aptly defends the case for Data Quality. Although Oracle offers a specific data cleansing platform in this flyer, the importance of clean data in any Data Management environment cannot be denied.

The Data Differentiator

The article titled The Data Differentiator: How Improving Data Quality Improves Business talks at length about the joint study conducted by Forbes and Pitney Bowes to determine how Data Quality is affecting business performance across the industry spectrum. With business data overflowing in from omni-channel data paths, businesses now face the growing challenge of storing, cleaning, and managing that data. There is a clear warning in this report that business owners and operators will have to assess and evaluate which data criteria are important for their operations, rather than being brainwashed by commercial data or data service providers.

Big Data, Hadoop, Cloud: Do They Affect Enterprise Data Quality?

As newer technologies like IoT and Platform-as-a-Service (PaaS) continue to surface on the global business horizon, business leaders are trying to figure out how to manage mammoth data disasters. If you review the DATAVERSITY® article titled How the New Data Technologies Are Affecting Data Quality and Data Governance, the author emphasizes the need for a strong Data Governance Framework to provide a definite structure to enterprise Data Management. Data Governance will ensure the creation of Data Stewardship to oversee specific roles, responsibilities, and ownerships related to enterprise Data Management. This article also suggests a “scalable security infrastructure” to ensure continued Data Security with the growth of data volumes and complexity.

On the one hand, today’s enterprises cannot ensure Data Quality without good Data Governance; on the other hand, technologies like Hadoop and NoSQL were never meant to contend with such Data Security issues commonplace today. While Big Data promises a huge opportunity for Data Quality Management, the allied technologies that support Big Data will have to be improved, enhanced, and prepared for optimum Data Security.

Unstructured Data Adds to the Complexity of Data Management

The famous data industry axiom “garbage in, garbage out” was routinely used as a warning to those data professionals who used short-cut methods to acquire data and expected the technology to deliver miracles. Data was never given as much importance as now, as the combined power of advanced hardware, sophisticated analytics platforms, the brightest Data Scientists, and petabytes of data has failed to deliver the expected results. Read the KD Nugget article titled Sisense Effective Data Preparation Quality Conclusions, where the author claims that in an age of unstructured data, data has to be prepared properly before any kind of analytics can yield results.

This article offers a tutorial for data preparation by outlining six critical steps suggested by Sisense. As this article suggests, data preparation is the most time and labor-intensive part of the Data Management process and can easily take up to 80 percent of the total time. As unstructured data is best tackled by Big Data, read the article titled Big Data Analytics Pain Points to understand how the steady growth of data volume since 2012 has further complicated the characteristics of the incoming data streams with social, sensor, transactional, and mobile data.

As a Ian Rowlands said his DATAVERSITY® Enterprise Data World 2017 Conference presentation, along with Data Quality, Data Lineage is of equal importance for business decision making because the invisible, connected dots between  the diverse sets of data control the success of such decisions. In other words, the industry leaders must be able to understand and use the inter-relationships between disparate datasets to make effective decisions.

Forrester Report for Data Quality Solutions

Oracle and many other organizations have made The Forrester Wave™: Data Quality Solutions, Q4 2015 publicly available through their sites. This report clearly states that enterprise leaders are focused on DQ because they consider Data Quality to be the key differentiator for business success. Many enterprises are on the lookout for professional partners in Data Quality and Data Governance, so that they can implement the right data strategies in their business setups. As technologies like Big Data, Cloud, or Machine Learning gain prominence in global business solutions and enable winning analytics capabilities, the service providers will become the envy of their competitors.


Photo Credit: pichetw/

Leave a Reply