Data Curation 101: The What, Why, and How

By on

data curationHumans have an imperative to practice Data Curation. People have and continue to gather, maintain, and archive data at ever greater volumes, and they always have. They drive to get useful data for today and tomorrow.

As Mike Schmoker elegantly states, “Things get done only if the data we gather can inform and inspire those in a position to make difference.” But, organizations struggle with getting things done and operationalizing Big Data well. Especially where 41 percent, of 150 executives at large companies, said that their data was too siloed. Without access to good Data Curation, business effectiveness decreases.

Risks of poor or no Data Curation include factually inaccurate information, incorrect guidelines, and knowledge gaps. This scenario has and continues to replay. For example, out of 401 items sent for a child passenger safety , by 101 organizations, about 25 percent of the evaluated items contained complete and accurate information. Each item could be thought of as a data collection. Less than 1 percent of the items seemed developed for other relatives or audiences transporting children, indicating knowledge gaps.

The resulting electronic collection and insights into the curated data, provided by individualized institutions, continued its use long after the study ended. A collection of about 400 materials, siloed and leading to inappropriate selection and installation of child seats may seem small, compared to using Big Data to make inaccurate financial decisions and impacting millions of customers. Good Data Curation is a must.

What is Data Curation?

Data Curation is a means of managing data that makes it more useful for users engaging in data discovery and analysis. Data curators collect data from diverse sources, integrating it into repositories that are many times more valuable than the independent parts.  Data Curation includes data authentication, archiving, management, preservation retrieval, and representation.

Characteristics of Data Curation include:

  • Social Signals: Data’s usefulness depends on human interaction. Aaron Kalb, the Head of Product at Alation calls this social signals or behavioral interactions. Just as Amazon presents recommendations based on what users choose, Data Curation leverages human responses towards customized knowledge. Data Analysts install their own methodology in interpreting and manipulating data. Data Curation provides access to this kind of human knowledge, which can be valuable on how others do their work. As Stephanie McReynolds, VP of marketing at Alation, says:

“The process of ideating around data and having it be an open communication around all the aspects of data brings the entire organization up to another level of data literacy so that we can really find useful solutions rather than get stuck in our own little silo.”

  • Active Management throughout the Data Lifecycle: The University of Illinois’ Graduate School of Library and Information Science defines Data Curation as “the active and ongoing management of data through its life cycle of interest and usefulness.” This lifecycle comprises steps of conceptualizing, creating, accessing, using, appraising, selecting, disposing, ingesting, reappraising, storing, reusing, and transforming Data. During this process, data might be annotated, tagged, presented, and published for various purposes. Data Curation means active management of data reducing threats to their long-term value and mitigating digital obsolescence.
  • Complimentary Work with Data Governance: Data Curation compliments Data Governance, but does not replace it. According to DAMA International Data Management Book of Knowledge, “Data Governance is defined as the exercise of authority and control (planning, monitoring and enforcement of data assets.” Implement a Data Governance program results in policies on how to handle data. Data Curation may make use of a Data Governance when customizing information. However, Data Curation produces customized business data, like a modern corporate library. The resulting Data Collections allow for more relevant information that is easier to search, not just a set of policies.

What is Data Curation Doing for the Data Industry?

As well as reducing duplication of effort in research data creation, Data Curation enhances the long-term value of existing data by making it available for further high-quality research. Data Curation does the following for the Data Industry:

  • Making Machine Learning More Effective: Machine Learning algorithms have made great strides towards understanding the consumer space. AI consisting of “neural networks” collaborate, and can using Deep Learning to recognize patterns. However, Humans need to intervene, at least initially, to direct algorithmic behavior towards effective learning.  Stephanie McReynolds, VP of marketing at Alation says “Curations are about where the humans can actually add their knowledge to what the machine has automated.” This results in prepping for intelligent self-service processes, setting up organizations up for insights. Forrester research shows that insights-driven firms are 69 percent more likely to report year-over-year revenue growth of 15 percent or more.
  • Dealing with Data Swamps: A Data Lake strategy allows users to easily access raw data, to consider multiple data attributes at once, and the flexibility to ask ambiguous business driven questions. But Data Lakes can end up Data Swamps where finding business value becomes like a quest to find the Holy Grail. Such Data swamps minus well be a Data graveyard. The Geological Survey of Alabama (GSA) has first-hand experience with this. The GSA has been reviving decades of dark (dead) data that could provide value. As part of that effort, the GSA has undertaken Data Curation to discover which of this data has locked-in value, even if it is old, that can be redirected to the benefit of users. This has led to a new GSA website with customized Data Collections.
  • Educating Audiences: Data Curation provides intrinsic value in educating users. Take the legal profession.  “Ultimately, the goal of any attorney is to get the jury to understand the case facts as they see them, so anything you can do to educate the jury to the forensics is extremely helpful,” says Jason Fries, CEO of 3D-Forensic. Through using the curated information provided by 3D-Forensic the jury learns how forensics created the analysis and have explanations of expert’s opinions involved in the case.
  • Ensuring Data Quality: Data Curators clean and undertake actions to ensure the long undertake actions to ensure the long-term preservation and retention of the authoritative nature of digital objects.

“Through the curation process, data are organized, described, cleaned, enhanced, and preserved for use, much like the work done on paintings or rare books to make the works accessible now and in the future,” according to ICPSR.

The value of these Data Curation activities and its resulting attention to quality improve Data Research and Management. For example, Data Curation tasks pertaining to Biodiversity have led to a framework to assess data’s fitness for use and increased data value. As a result, two Global Biodiversity Information Facility (GBIF) task groups have more useful data on Species Distribution Modeling and Agro-biodiversity for collaboration.

  • Speeding Innovation: Organizations are looking to identify ways they can manage data most effectively, while establishing the collaborative ecosystem to enable this efficiency. Data Curation enhances collaboration by opening and socializing how data is used. This results in innovation, as mentioned by Harvard Business Review. This article describes how the head of the U.S. Army’s Rapid Equipping Force built a curation process, including an internal and external collaboration, to help technology solutions be deployed rapidly. In this case, Data Curation helped the U. S. Army identify who the customers for possible solutions would be, who the internal stakeholders would be, and even what initial minimum viable products might look like.

Data Curation: Advantages and Challenges

Shacklett notes “ Data Curation is just now starting to enter corporate vocabulary because of Big Data and the need to aggregate data from diverse sources to form a unique picture of a business situation.” Why now? Industry prognosticators and companies are beginning to think about their data as a corporate asset. Companies are beginning to understand that they can’t just continue to blindly “store up” the vast piles of data streaming into them without developing a way to value this data and to determine which data has present or potential value, and which will always virtually remain useless. Data Curation provides organizations the means to get useful data by leveraging expertise and knowledge of its own data assets.

However, Data Curation requires a huge investment, as Dianne Esbar, associate partner and brand leader at Digital McKinsey in San Francisco. It requires companies to find the right people to curate data and give them the right tools. This presents a challenge to many companies. “Either they overinvest in tools that don’t work with each other or don’t give them what they need, or they have an army of people who in ten years’ time won’t be as valuable.”

Towards establishing successful Data Curation, Kathy Rondon cleverly laid out the fact that Data Curation is about “contextual Metadata,” and presented four primary requirements of setting up a successful Data Curation program, at the DATAVERSITY® Enterprise Data World 2017 Conference in Atlanta, Georgia.  By staying educated and informed on Data Curation best practices, including data reviews with end users, companies can reap its benefits.


Photo Credit: Casezy idea/

Leave a Reply