Loading...
You are here:  Home  >  Data Education  >  Current Article

What is Data Preparation?

By   /  November 17, 2018  /  No Comments

Data Preparation describes the process of getting data ready for use in Analytics. In the past, Data Preparation was a time-consuming task handled by the IT team, involving “Data Extraction, Transformation and Loading (ETL), access to Data Warehouses and Data Marts, and lots of complicated massaging and manipulation of data across other data sources,” says Kartik Patel, but organizations are increasingly turning to sophisticated Self-Service Data Preparation tools that allow business users to prepare data themselves.

Other Definitions of Data Preparation Include:

  • A“pre-processing step in which data from one or more sources is cleaned and transformed to improve its quality prior to its use in business analytics.” (Infomatica)
  • Technology that allows administrators to make faster and better decisions through Data Quality and Data Access. (Jon Pilkington, DATAVERSITY®)
  • “Data preparation is the process of collecting data from a number of (usually disparate) data sources, and then profiling, cleansing, enriching, and combining those into a derived data set for use in a downstream process.” (Paxata)
  • A process to “identify and separate the relevant data items from a large body of data, so the separate items can be used in analytics queries.” (Mary Shacklett, TechRepublic)
  • The “most time-consuming task in analytics and BI [that] is evolving from a self-service activity to an enterprise imperative.” (Ehtisham Zaidi, et al., Gartner)
  • “The process of collecting, cleaning and consolidating data into one file or data table, primarily for use in analysis.” (Datawatch)

Businesses Use Data Preparation to:

Photo Credit: Harry Huber/Shutterstock.com

About the author

Michelle Knight enjoys putting her information specialist background to use by writing technical articles on enhancing Data Quality, lending to useful information. Michelle has written articles on W3C validator for SiteProNews, SEO competitive analysis for the SLA (Special Libraries Association), Search Engine alternatives to Google, for the Business Information Alert, and Introductions on the Semantic Web, HTML 5, and Agile, Seabourne INC LLC, through AboutUs.com. She has worked as a software tester, a researcher, and a librarian. She has over five years of experience, contracting as a quality assurance engineer at a variety of organizations including Intel, Cigna, and Umpqua Bank. During that time Michelle used HTML, XML, and SQL to verify software behavior through databases Michelle graduated, from Simmons College, with a Masters in Library and Information with an Outstanding Information Science Student Award from the ASIST (The American Society for Information Science and Technology) and has a Bachelor of Arts in Psychology from Smith College. Michelle has a talent for digging into data, a natural eye for detail, and an abounding curiosity about finding and using data effectively.

You might also like...

Data Governance and Data Stewardship Drive Successful Glossaries and Dictionaries

Read More →