Data Preparation describes the process of getting data ready for use in analytics. In the past, Data Preparation was a time-consuming task handled by the IT team, involving “Data Extraction, Transformation and Loading (ETL), access to Data Warehouses and Data Marts, and lots of complicated massaging and manipulation of data across other data sources,” says Kartik Patel, but organizations are increasingly turning to sophisticated Self-Service Data Preparation tools that allow business users to prepare data themselves.
Other Definitions of Data Preparation Include:
- A“pre-processing step in which data from one or more sources is cleaned and transformed to improve its quality prior to its use in business analytics.” (Infomatica)
- Technology that allows administrators to make faster and better decisions through Data Quality and data access. (Jon Pilkington, DATAVERSITY®)
- “Data preparation is the process of collecting data from a number of (usually disparate) data sources, and then profiling, cleansing, enriching, and combining those into a derived data set for use in a downstream process.” (Paxata)
- A process to “identify and separate the relevant data items from a large body of data, so the separate items can be used in analytics queries.” (Mary Shacklett, TechRepublic)
- The “most time-consuming task in analytics and BI [that] is evolving from a self-service activity to an enterprise imperative.” (Ehtisham Zaidi, et al., Gartner)
- “The process of collecting, cleaning and consolidating data into one file or data table, primarily for use in analysis.” (Datawatch)
Businesses Use Data Preparation to:
- Empower business users and reduce the burden on IT
- Make the best use of resources.
- Use existing knowledge and skills to identify trends and patterns.
- Bring agility to the decision-making process.
- Reduce time to analyze data.
- Get data ready for supervised Machine Learning.
Photo Credit: Harry Huber/Shutterstock.com