Data Curation needs to be guided by Data Governance so it can secure itself a place as a core Data Management requirement. The MIT Technology Review Insights states, “Data is the single biggest asset at most organizations. It’s time to start treating it accordingly.” Towards that end, leaders and boards have been called to action through the Leader’s Data Manifesto, making their organizations truly data driven.
Kelle O’Neal, the CEO and Founder of First San Francisco Partners, breaks this charge down by noting that an enterprise needs to “ensure its data adds value to the enterprise as well as enabling the enterprise to derive value from data.” For data to be useful, Data Curation, guided by a clear Data Governance plan, must make its way into the mature organization.
For data to be an asset, it needs be maintained, preserved, culled, and added to throughout its life cycle, making Data Discovery and Analytics easier to sustain by both internal and external consumers. Customers and innovative technologies demand it. This defines the Data Curation realm.
Organizations cannot curate in a vacuum. Data policies and controls around curation goals and customer activities need to be applied. This explains the importance of the Data Governance realm. As Ken Kring, Principal at PGTT states, Data Governance today means “the continual cleaning and integration of data to drive profitable behaviors of customers and employees on an ongoing basis.”
Data Curation is a necessary part of mature Data Management, while Data Governance provides the consistency and balance to allow organizations to leverage Data Curation as a primary element of that entire system.
Data Curation in the Mature Organization
Why associate Data Curation with mature organizations now? Investors and organizations recognize Big Data (and all data) as a corporate asset. Like currency, companies are beginning to understand that they can’t just continue to blindly “store up” the vast piles of data streaming into them without developing a way to value this data, and to determine which data has present or potential value. Data curators collect data from diverse sources, integrating it into repositories that are many times more valuable than the independent parts.
Organizations, to be mature, can no longer ignore Data Curation.
Increasing Variety of Data Sources
The 2018 Data & Analytics Global Executive Study and Research Report by MIT Sloan Management Review finds that innovative, analytically mature organizations make use of data from multiple sources.
This includes a variety of data types such as mobile, social and public data. Companies are beginning to understand that they can’t just continue to blindly “store up” the vast piles of data streaming into them without developing a way to value this data and to determine which data has present or potential value, and which will always virtually remain useless.
To keep data and maintain data as an asset, as Pat Hennel notes, corporations need to consider Data Curation.
Machine Learning Growth
Machine Learning will become one of the game changers of the coming decade as Gartner notes, 35 percent of IT resources will be spent to support the creation of new digital revenue streams and by 2020 almost 50 percent of IT budgets will be tied to digital transformation initiatives, including Machine Learning.
Stephanie McReynolds, VP of marketing at Alation says “Curations are about where the humans can actually add their knowledge to what the machine has automated.” This results in prepping for intelligent self-service processes, setting up organizations up for insights. Hence a drive for Data Curation to remain ahead and more effective on the Machine Learning curve.
The Data Lake Problem from Moving into the Cloud
By 2020, Gartner analysts say that almost 40 percent of enterprises will use the Cloud to support more than half of transactional systems of record. In fact, according to a 2017 survey, 72 percent of US finance executives said they are either using Cloud-based solutions or plan to do so in the future, which is an increase from 62 percent in the 2016 survey.
With the increase of Cloud migration and streaming, data storage will look more like a Data Lake. The Cloud has become the default ingest point for data for many organizations. To avoid turning sending Big Data to its death in the Cloud, Data Curation must be considered with customized data collections. As the Geological Survey of Alabama (GSA) has discovered, with their data graveyard, Data Curation has been imperative in resurrecting Data.
Data Governance: The Hub of Data Curation
Creating and maintaining data collections, using Data Curation, needs to be guided by Data Governance. Questions such as who will have access to or be responsible for adding to and culling a Data Collection/s. must be addressed. To prevent collections from becoming Data cesspools (aka Data Swamps) and to facilitate communication about the collection with a common vocabulary (e.g. A Business Glossary) requires Data Governance.
Three reasons why Data Governance needs to direct Data Curation are:
- Self-Service Analytics
Customers and Corporations depend and demand Self-Service Analytics. This form of Business Intelligence, enables and encourages business professionals to query and generate reports. Consider that Self-Service Analytics Platforms with superior data visualization capabilities will gain momentum in 2018.
Organizations embracing this “self-service” model will of course need more governance, so that business users have more control over the data they analyze. To connect strict Data Governance with business requirements, an organization needs Data Curation. Any analyst can curate data knowledge and this information can be reused, notes the company Alation. Data Governance policies ensure compliance, collaboration, and faster insights from Data Curation.
- General Data Protection Regulation (GDPR)
General Data Protection Regulation (GDPR) affects personal information processing and storage in any data collection. Should any Data Curator add, maintain or archive financial or health data, he or she will likely find personal information in the mix. Non-compliance costs 4 percent of an organization’s revenue.
To help customers deal with the upcoming GDPR regulations, in effect by May 2018, a number of companies have created effective systems to help manage the data and the regulations. Issues for DBAs, Data Protection Officers, and many others are being discussed and dealt with by organizations all over the world who will have to contend with these new laws. Data Curation and Data Governance are of the utmost necessity.
- Leveraging Human Interaction
Data Curation provides access to human knowledge and open communication by leveraging human responses towards customized information. As Wilder-James writes, “you can’t cleanly separate the data from its intended use…every new problem has its unique aspects that usually reach back into data acquisition and preparation”
Data Silos, from structural, political, and growth factors, can make it “prohibitively costly to extract data.” Data Curation without direction from Data Governance promotes Data Silos. Data Governance 2.0 approaches authority and policy from a company-wide view. Data Governance makes it possible, for example, that a person in sales can retrieve information from an engineering Data Curator and data collection, to answer a query.
As Danny Sandwell elegantly says, “Data Governance is the foundation of an Enterprise Data Strategy.” Data as an asset needs to be useful to people across a corporation, not just a specific team or department.
Many organizations want to be data-driven. Data Curation provides a means to get value. Data Curation needs to enter in company considerations because and the need to aggregate data from diverse sources to form a unique picture of a business situation.”
In addition, staying competitive with Machine Learning and moving Big Data into the Cloud requires Data Curation. But, Data Curation must be part of a comprehensive Data Governance program. Self-Service Analytics, and regulations like the GDPR require it. Setting up a Data Governance program without Data Curation will result in an inability to keep up with the changing Data Management landscape.
Photo Credit: Anna_leni/Shutterstock.com