
In many organizations, AI exists as a secret weapon. It’s essentially locked away in a box, brought out to work on a limited use case, and then put back into the box. However, using AI in this limited fashion eliminates the huge potential impact that AI can have across one’s business. This is where DataOps, short for data operations, can help.
DataOps borrows heavily from DevOps, which is defined partly by the tools used to automate management, testing, deployment, and replacement of code, as well as a culture of velocity and agility. Companies that figure out how to use DataOps techniques to center their new AI data products against unified data stores can democratize AI in their businesses – and gain a potentially lucrative advantage.
It might not be obvious, but data products are everywhere. Whether it’s a BI dashboard, a machine learning model, or a drug discovery application, if it’s providing large groups of users with governed access to clean, well-managed data, then it’s a data product.
GenAI’s Impact on Data Product Design
The generative AI revolution has ushered in a new class of data products. Technological breakthroughs in large language models (LLMs), like OpenAI, ChatGPT and Google Gemini, have given companies powerful new tools, such as chatbots, coding co-pilots, and AI agents to optimize the user experience.
We’re still in the early days of the GenAI revolution, but leading companies are already figuring out how the new capabilities can fit into their existing infrastructure. One of the most critical steps is figuring out how to provide these new AI data products with secure and governed access to your organization’s historical and operational data stores.
Companies rushing to capitalize on GenAI might be tempted to build their data infrastructure the old way. They might stand up a new data warehouse or carve out a data mart in an existing one, where they can meticulously stage and carefully prep the data that will be served to the GenAI application. They’ll be tempted to code Python transformations by hand or build customized workflows in ETL tools to get everything just perfect.
However, this approach doesn’t work in the real world. Creating additional silos of data exacerbates data consistency issues, leading to more checks and data engineering overhead. Coding data transformations by hand limits the number of connections that can be built, which restricts the impact that GenAI products can have.
Leveraging DataOps to Connect GenAI and AI Agents to Data
To increase agility and maximize the impact that AI data products can have on business outcomes, companies should consider adopting DataOps best practices. Like DevOps, DataOps encourages developers to break projects down into smaller, more manageable components that can be worked on independently and delivered more quickly to data product owners. Instead of manually building, testing, and validating data pipelines, DataOps tools and platforms enable data engineers to automate those processes, which not only speeds up the work and produces high-quality data, but also engenders greater trust in the data itself.
DataOps was defined many years before GenAI. Whether it’s for building BI and analytics tools powered by SQL engines or for building machine learning algorithms powered by Spark or Python code, DataOps has played an important role in modernizing data environments.
One could make a good argument that the GenAI revolution has made DataOps even more needed and more valuable. If data is the fuel powering AI, then DataOps has the potential to significantly improve and streamline the behind-the-scenes data engineering work that goes into connecting GenAI and AI agents to data.
The good news is that AI can significantly improve the DataOps process. Just as DataOps can help provide clean, trusted, and reliable data for GenAI data products, GenAI can be used to drive more automation into the DataOps process itself.
How DataOps + a GenAI Approach Can Simplify Data Engineering Efforts
While DataOps platforms already bring a significant amount of automation to the data table, many of them require the skills of an experienced data engineer to derive full value. Data pipelines still need to be created, tested, validated, and secured. Every time an application generates changes to source data, it must be checked to ensure that the pipeline isn’t broken or will result in bad data being fed into the system.
By pairing an LLM with an existing DataOps platform, new heights of automation and data-driven insights are attainable. This approach can radically expand the pool of individuals who are qualified to manage the DataOps platform, thus expanding the number of AI data products that can be pushed into production.
Instead of dedicating half a dozen or so full-time equivalent (FTE) data engineers to simply maintaining existing data pipelines, a DataOps/GenAI approach can radically reduce the number of FTEs needed. Broad knowledge of YAML, Python, and SQL, the intricacies of data models, how to work with data catalogs, and how to use API data tools, is no longer the minimum requirement for entry.
That’s not to say that GenAI-powered DataOps platforms run by themselves. Humans are still calling the shots and approving the workflows. But it removes humans from doing the rote, manual work of building pipelines. Benchmarks like SWE-bench demonstrate that GenAI copilots have gotten quite good at understanding requirements and writing code – not to mention documenting all the work they’ve done for the purpose of validation. So why not use these new GenAI capabilities for data engineering?
Thanks to the creation of GenAI-powered DataOps platforms, it’s possible today to assign a fraction of one FTE to the job of managing data pipelines. That allows a company to dedicate more data engineering resources to challenging tasks, such as identifying valuable data and sharing knowledge of data with software developers and AI engineers collaboratively.
GenAI is having a large impact on data engineering and has the potential to liberate DataOps work involved in data products. The number of potential applications for GenAI is practically limitless, but GenAI is potentially limited by one important factor: enterprise data. By building automated data platforms that can be easily tapped into by developers, companies can lower the barriers to data, democratize AI in their organizations to meet business objectives.