Click to learn more about author Farnaz Erfan.
The traction for self-service data prep initially started in the context of business analytics as a productivity and enablement tool for an analyst who is developing insights. However, as Gartner recently pointed out, the market has now evolved to use cases that require platform capabilities such as collaboration and integration into all types of data and analytics projects.
According to Gartner’s Market Guide for Data Preparation Tools:
“The market for data preparation tools has evolved from being able to support only self-service use cases. Modern data preparation tools now enable data and analytics teams to build agile datasets at an enterprise scale, for a range of distributed content authors.”
Take Nationwide for example. They recently discussed how 500 individuals across 76 teams including Product Pricing, Finance and Accounting, Backoffice Operations, Commercial Lines, and more – are using data preparation to create line of business autonomy and a data-driven organization culture.
As you can imagine, in order for self-service data preparation to expand into an enterprise-wide usage such as the one that Nationwide demonstrates, certain capabilities were required. For instance, embedded catalog capabilities enable data practitioners to discover and find certified information assets that have been developed and contributed to by other teams. This, in turn, provides the ability to share, annotate and reuse certain preparation steps. It also enables data practitioners to source new or curated data and transform it into new and updated information assets on a repeated basis, which is a popular requirement that transcends beyond self-service scenarios.
Granted, in some cases, data preparation starts and stops within one use case. For example, preparing customer data for segmentation and targeting is often a one-off use case and, at most, a scenario that doesn’t happen on a regular basis. While some use cases remain ad-hoc and a onetime application, others mature into repeatable processes. For those cases, a data preparation solution with advanced and intelligent automation is the best fit to create a cadence of data flows and integration among them and is often the first step of maturity. A natural transition happens when self-service – where an individual is continuously involved in the preparation of data – shapes into automated data prep where the human involvement is perhaps only an oversight.
With larger deployments, a natural progression leads to collaboration among many users with different skill sets – e.g. data scientists, data engineers, data analysts, business operations managers, director of insights, and more. As skillsets vary across these teams, different styles of data preparation might be considered. However, a data preparation tool needs to strike the right balance between rich-functionality and ease of use in order to cater to all skillsets or to create what Gartner calls Information as a Second Language among all data practitioners.
This is another step towards the maturity, where a self-service data preparation paradigm transitions into a collaborative, crowdsource nature, accelerating adoption, and onboarding across all roles. While data preparation solutions range from stand-alone to those embedded in data integration or business intelligent solutions, as Gartner’s earlier Market Guide rightly points out, those that provide data preparation only within the context of another application are limited in reaching this level of maturity.
Prepared data will naturally become the input to many initiatives – i.e. business intelligence and analytics, master data catalogs, data science projects, or intelligent systems. As such, a maturity curve ensures close integration with the rest of the information management stack from an architectural point of view. When data preparation is foreseen as a broad strategy – an enabler to data democratization – understanding the maturity requirements of such a solution not only ensures end user understanding of data and the democratization of its use, but also maximizes the integration of self-service data assets into the organization’s broader information fabric.