Challenges for Data Governance and Data Quality in a Machine Learning Ecosystem

By on

The high availability of data, enhanced computing power and advanced Data Science technologies together make a lethal combination for data-driven outcomes. With the open data economy just around the corner, well-tuned Data Governance capabilities will be the goal of most businesses.

Current Data Management practices are focused on risk free data sharing and regulatory compliance. In an open data economy, reduced risks of data sharing and increased governance mechanisms are keys to success. As Data Governance continues to gain prominence in data-powered business models, organizations will invest in advanced data technologies such as artificial intelligence (AI) and machine learning (ML) to “achieve quality, compliance, and security at scale.” According to Bill Tomazin, Managing Partner, West Region and National Audit Solutions, at KPMG LLP U.S.

“If the data is not reliable or of poor quality, less-than-optimal business decisions are likely.”

As recounted by the author of The Impact of Data Quality in the Machine Learning Era, Data Quality assumes even more importance in the ML-powered, self-service analytics era as business users are not qualified to assess the quality of data in use. Businesses now realize that unless Data Quality issues are tackled first, their AI investments may go to waste! In the modern business analytics regime, increasing variety of data sources,  input channels, high data volumes, and “unstructured data types” have added to Data Management woes, especially in areas of Data Quality and Data Governance.  Here is a report from McKinsey, The Insights Value Chain: Data Quality Challenges in IoT, which helps to highlight Data Quality challenges in the Internet of Things (IoT) data.

Challenges for Data Quality in Digital Businesses

While multi-type and multi-source data has enriched the enterprise data troves, Data Management has become a serious challenge because of poor Data Quality. Data Quality management continues to haunt Data Management experts, and they know that unless Data Quality issues are tackled properly, businesses can lose the golden opportunity of deriving competitive intelligence. Even most researchers think that Data Quality concerns handicap the true potential of data-driven enterprises. The use of ML technology to mitigate Data Quality challenges is still limited, though most industry leaders believe that ML has the potential to confront Data Quality problems head-on. Moreover, the solutions provided by advanced AI/ML solution platforms to tackle Data Quality are often highly economical and efficient. Ever since “manual Data Quality assessment cleansing” has been replaced by automated tools, data professionals have recouped valuable work time for actual Data Science tasks.

ML solutions currently have the capability to “assess the quality of data assets, predict missing values, and provide cleansing recommendations, thereby reducing the complexity and efforts spent by data quality experts and scientists.”  

With data entry points increasing by the day, businesses are struggling to collect and store that data in an efficient manner. AI provides the opportunity to automate the data entry process through “intelligent capture,” thereby enhancing the quality of incoming data. Good quality data enhances the quality of marketing campaigns and predictive analytics. Review this blog post to get the latest information about AI, ML, and Master Data Management working together to deliver the best Data Management outcome.  

The article, Challenges of Data Quality in the AI Ecosystem, helps to bring out the common Data Quality issues inherent in AI projects, where advanced data technologies like ML and deep learning (DL) are collectively used to manage “data capture, data storage, data preparation, and advanced data analytics.” To describe the magnitude of the problems, the author of this article quotes Nathaniel Gates, CEO and co-founder of Alegion, an AI and ML training data platform:

“The single largest obstacle to implementing ML models into production is the volume and quality of the training data.”

Challenges for Data Governance in Digital Businesses

As the author of a blog post points out, “the core and often niggling issue of Data Quality” further complicates an organization’s Data Management complexity due to “disparate data sources, immense data volumes, and unstructured data types.” While AI/ML powered systems continue to gain momentum in digital businesses, the absence of solid Data Governance frameworks has the “potential to unleash unreliable and misleading information and unexpected expense overheads.”

Here are some common and often debated Data Governance challenges facing AI/ML powered enterprises:

  • The data access controls — who has access to what data?
  • The accuracy, consistency, and reliability of data.
  • The current data storage and integration infrastructures — are they adequate?
  • The security issues surrounding data movements within and without of businesses.
  • The implemented Data Governance Plans — what is lacking?

The Forbes author has explored the core issues of a Data Governance Plan in an AI-powered Data Management Environment, which includes data integrity, data security, data integration, and lastly Data Governance. Apart from looking at the Data Quality, access controls, consistency, and storage-integration techniques, the article also analyzes the limitless possibilities of data-driven insights in an AI/ML powered business ecosystem. An article from Data Republic reveals the top Data Governance trends visible in digital businesses today, where Metadata Management, Data Modeling, Data Quality, and data security take high priority. According to the author of this article, a good Data Governance Plan tracks “data sources, data usage, and data lineage from origin to final use,” and aims to blur the “distinctions between people, process, digital, analytics and data.”

Machine Learning Viewed as a Savior for Data Governance

An article, Data Governance and Machine Learning, reveals the current status of AI adoption in the industry. On one hand, the C-Suite Executives are more than eager to embrace AI-enabled, Data-Management solutions; on the other hand, the technology experts are sure that AI/ML technology adoption may remain a distant dream unless sound Data Strategy plans, of which Data Governance is a core component, are in place.

Here is an interesting blog post titled Metadata and Machine Learning in Data Governance, in which the author argues that in a post-GDPR world, metadata plays a crucial role in Data Governance, as evidenced by the rise of topical discussions on the “role of metadata in Data Governance.” Earlier, Gartner declared that by 2020, 50% or more of Data Governance “policies will be driven by metadata.” By making business practices transparent through a “common vocabulary and an auditable process,” metadata has now helped ML technologies to populate the business corridors.

Any modern enterprise must have a proper Data Management infrastructure in place to reap the benefits of “technology-supported decision-making” facilitated by advanced AI/ML systems. But then for these advanced technological systems to deliver competitive intelligence, the flow of data has to be “tracked, controlled, and monitored” throughout its journey in an end-to-end enterprise analytics system.

A Popular Data Governance Use Case: Financial Sectors

In the article titled How Can Machine Learning Affect Your Organizational Data Strategy, the author stresses that the success of ML solutions is closely interlinked with Data Governance strategies at work within an enterprise. Currently, while general US businesses are busy implementing CCPA or its many variations across the country, the financial sector seems to have found a convincing answer in ML-powered solutions. Taking a sector-by-sector investigative stance, the AI service vendors think that their solutions are designed to address all regulatory or compliance requirements commonly plaguing the financial services sectors.

Read this Forbes post to understand how audit teams in the financial sector can play a watchdog for internal Data Governance practices by examining the operative Data Governance framework. The auditors can also ensure that an organization’s Data Governance practices are aligned with the overall corporate vision. The American Institute of Certified Public Accountants published a report titled An Overview of Data Management, which confirms that that internal audit teams routinely apply Data Governance principles in their daily work with financial data.

As digital businesses rely solely on the power of data for their operations, Data Governance plays a strategic role in delivering competitive advantages. Data, coupled with advanced technologies, can push a business to the pinnacle of success if used properly. However, as revealed by Guardians of Trust, a KPMG International report, 2200 business executives are concerned about the governance challenges of “data on a shared platform,” as in healthcare or manufacturing businesses. In these industry sectors, typically many parties exchange data on a high-frequency basis — thus challenging the integrity of available data.

Data and Business Teams Playing Team Sports

A DBTA article, focusing on cost justifications for Data Quality technology investments in AI/ML systems, reveals that the primary source of “poor-quality data,” is sales departments, where the sales staff frequently enters incorrect or incomplete data on the CRM system. The poor data can easily propagate to other departments or functions via linked process and applications. The basic problem of Data Management, as this article indicates, is the lack of communication between the IT and business staff. The business staff thinks data is an IT problem while the IT department thinks clean data is the responsibility of the business staff, who create the data.

At a recent business summit, business leaders, while acknowledging the importance of a Data Strategy for data-driven insights, failed to share their own success with a clearly defined Data Strategy. They felt that data practices should include the business and data staff as part of a team. “Translators” would serve as connective tissue to bridge the communication gap that can exist between the business and technical experts.”

They felt that data practices should include the business and data staff as part of a team. The team would use “translators” to “serve as connective tissue to bridge the communication gap that can exist between the business and technical experts.”

Image used under license from

Leave a Reply