Data products are proliferating in the enterprise, and the good news is that users are consuming data products at an accelerated rate, whether it’s an AI model, a BI interface, or an embedded dashboard on a website. The bad news is that too many data engineering teams still rely on manual methods to keep these […]
Change Data Capture and the Value of Real-Time Data Integration
Business insights are only as good as the accuracy of the data on which they are built. According to Gartner, data quality is important to organizations “in part because poor data quality costs organizations at least $12.9 million a year on average.” So, we believe that it stands to reason that providing access to the […]
It’s Essential – Verifying the Results of Data Transformations (Part 1)
Today’s data pipelines use transformations to convert raw data into meaningful insights. Yet, ensuring the accuracy and reliability of these transformations is no small feat – tools and methods to test the variety of data and transformation can be daunting. Transformations generally involve changing raw data that has been cleansed and validated for use by […]
Understanding the Modern Data Stack
The modern data stack is a collection of tools used to collect, store, and analyze data. Understanding the components of a modern data stack is crucial in grasping how contemporary data ecosystems function. At its core, data engineering plays a pivotal role by focusing on the practical application of data collection, storage, and retrieval. This discipline ensures […]
Data Integration Tools
Data integration tools are used to collect data from external (and internal) sources, and to reformat, cleanse, and organize the collected data. The ultimate goal of data integration tools is to combine data from a variety of different sources, and provide their users with a single, standardized flow of data. Use of these tools helps […]
10 Advantages of Real-Time Data Streaming in Commerce
While early science fiction shows like “Buck Rogers” (1939) and “The Fly” (1950) depicted teleportation technology, it was Star Trek’s transporter room that made real-time living matter transfer a classical sci-fi trope. While we haven’t built technology that enables real-time matter transfer yet, modern science is pursuing concepts like superposition and quantum teleportation to facilitate information transfer across any distance […]
What Are Data Products and Why Do They Matter?
Data products are software in the form of specialty tools and apps that are designed to support data used as a service. They may be as simple and straightforward as a program that converts a dataset into a visualization, or as complex as a machine learning system based on large language models (LLM), such as ChatGPT. […]
Building Data Pipelines with Kubernetes
Data pipelines are a set of processes that move data from one place to another, typically from the source of data to a storage system. These processes involve data extraction from various sources, transformation to fit business or technical needs, and loading into a final destination for analysis or reporting. The goal is to automate […]
Why Is Data Quality Still So Hard to Achieve?
We exist in a diversified era of data tools up and down the stack – from storage to algorithm testing to stunning business insights. In fact, it’s been more than three decades of innovation in this market, resulting in the development of thousands of data tools and a global data preparation tools market size that’s set […]
Testing and Monitoring Data Pipelines: Part One
Suppose you’re in charge of maintaining a large set of data pipelines from cloud storage or streaming data into a data warehouse. How can you ensure that your data meets expectations after every transformation? That’s where data quality testing comes in. Data testing uses a set of rules to check if the data conforms to […]