Data Curation, as defined by The University of Illinois’ Graduate School of Library and Information Science: “is the active and ongoing management of data through its life cycle of interest and usefulness.” Sayeed Choudhury, Associate Dean for Research Data Management at Johns Hopkins University (JHU) and leader of the Data Conservancy, further breaks down Data Curation iterative activities to:
- Preserving: Collecting and taking care of research data.
- Sharing: Revealing data’s potential across domains
- Discovering: Promoting the re-use and new combinations of data
According to Alation:
“In practice, data curation is more concerned with maintaining and managing the metadata rather than the database itself and, to that end, a large part of the process of data curation revolves around ingesting metadata such as schema, table and column popularity, usage popularity, top joins/filters/queries. Data curators not only create, manage, and maintain data, but may also be involved in determining best practices for working with that data. Data curators often present the data in a visual format such as a chart, dashboard or report.”
Other definitions of Data Curation include:
- “The processes of collecting data from diverse sources and integrating it into repositories that are many more times more valuable than the independent parts.” (techrepublic)
- “Digital curation involves maintaining, preserving and adding value to digital research data throughout its lifecycle.” (Digital Curation Centre)
- “The process of “caring” for Data, including to organizing, describing, cleaning, enhancing and preserving data for public use. Through curation the ICPSR (the International Leader in Data Stewardship) provides meaningful and enduring access to data.” (ICPSR)
- “A means of managing data that makes it more useful for users engaging in data discovery and analysis.” (Alation)
Businesses perform Data Curation to:
- Enable data discovery and retrieval.
- Maintain Data Quality.
- Add Value.
- Provide for data reuse over time.
- Maximize Access.
- Leverage human responses towards customized knowledge.
- Compliment work in Data Governance.
Data Curation processes:
- Make Machine Learning more effective.
- Better handle Data Swamps.
- Educate audiences.
- Speed innovation.
Photo Credit: chombosan /Shutterstock.com