Advertisement

DataOps and Scalability: The One-Two Punch for Creating Successful Data Products

By on
Read more about author Guy Adams.

Data products are proliferating in the enterprise, and the good news is that users are consuming data products at an accelerated rate, whether it’s an AI model, a BI interface, or an embedded dashboard on a website. The bad news is that too many data engineering teams still rely on manual methods to keep these data products running, which inhibits growth and the ability to meet business objectives. Luckily, the era of automated DataOps has arrived.

Data engineers are the unsung heroes of the data world, toiling away on their keyboards to ensure that fresh, clean analytics pipelines are always ready for consumption by downstream data product users who need to make informed decisions daily. They spend hours working in ETL/ELT tools and writing Python and YAML scripts to move and transform data. They must know the ins and outs of various APIs for the tools they use and the SQL variants for each database or data warehouse, not to mention the specific data models used by different data catalogs. In other words, it’s not easy being a data engineer.

Data Engineering, Data Products, and DataOps: Pros and Cons to Making Data Actionable

In the early days of big data, having a data engineer-to-data scientist ratio of two-to-one was seen as ideal; however, many companies struggled to hire enough data engineers to gain actionable Insights. Backlogs grew as data scientists and analysts submitted requests to data engineers for the specific data they needed for their applications. Data engineers would need to figure out how best to serve these data requests and then do the manual work of building an efficient data pipeline to extract data, join tables, and deliver the final data reliably and predictably. As a result, waiting up to six months for data delivery on specific requests was common for business users.

Over the past few years, we’ve seen the data product emerge as a viable concept. As previously stated, a data product can take many shapes or forms, including a dashboard displaying historical data generated by SQL queries running in a data warehouse, or a machine learning algorithm applied to historical data to predict the future. Today, with the rise of generative AI and large language models (LLMs), a data product can be a response generated by an LLM to a user request submitted via natural language.

No matter what its final form, data products are unique because they provide a repeatable way for operations teams and business teams to access and use data that’s clean, accurate, and well-governed. As users discover that data products are a great way to interact with data, demand for them is increasing.

That is where DataOps comes in. Just as the world of DevOps provided uniformity and consistency to the developer lifecycle, the DataOps era is bringing a new level of automation and scale to the data product support work of the overworked data engineer. Today’s DataOps tools and platforms can help data engineers build and manage more data pipelines – and thus data products – than they could if they were still doing it manually. Thanks to the greater efficiencies that automated DataOps tools bring, it’s not uncommon to see a company go from managing a dozen data products to several hundred data products, a veritable 10x improvement.

DataOps platforms don’t automate everything involved in data pipelines that support data products, so data engineers are still needed to ensure that data pipelines run smoothly. The DataOps platform may automatically generate YAML configuration files and SQL queries, but a human data engineer still needs to confirm that the code is valid.

DataOps + GenAI = Highly Automated and Scalable Data Pipelines

Today’s data environments are highly dynamic and often demand daily changes in source data. In the world of AI, the underlying models are changing all the time, sometimes for the better, sometimes not. The retrieval-augmented generation (RAG) databases that companies use to improve LLM model response are constantly being refreshed and updated in real time.

To stay on top of this dynamic environment, DataOps platforms are constantly running validation checks. They’re checking to ensure that the data being fed into the data product meets the company’s quality standards. The DataOps platform may provide the capability to automatically generate a handful of feature branches for a data pipeline, but a data engineer still signs off on those changes at the end of the day.

Today’s DataOps platforms use GenAI techniques to automate many tasks, from writing the configuring code to running the validation checks. Data engineers can tell the DataOps platform how to construct the data pipeline – including which files to use, what transformations to apply, what kinds of checks to run, and where to land the final data – and the LLM will generate the code.

One of the most useful ways that GenAI helps data engineers is through documentation. Developers and data engineers are notoriously bad at documenting their work and explaining what they did. Thanks to GenAI, the work is always well-documented, which helps the data engineers scale their work and support the generation and use of even more data products.

In many ways, the advantages that DataOps brings to data engineering are similar to how Henry Ford revolutionized automobile manufacturing. Cars used to be put together by hand, which was a slow and expensive process. Ford introduced assembly lines, which dramatically sped up manufacturing and lowered the price of cars.

We’re seeing the same acceleration in data. Instead of manually building data pipelines, DataOps allows us to automate many of the most tedious data engineering tasks. And when you add AI into the equation, DataOps tools are turbocharging the creation of data products beyond what first-gen DataOps platforms could deliver.

The introduction of data products will fundamentally change our relationship with data. Instead of worrying about the quality and quantity of data, data products give us the confidence that the data is cleaned, secured, and well-governed, thereby providing new groups of users access to trusted data. And when a company backs its data products with an automated DataOps solution, look out: It may have discovered the secret to unleashing the full potential of its data.