Advertisement

Testing and Monitoring Data Pipelines: Part Two

In part one of this article, we discussed how data testing can specifically test a data object (e.g., table, column, metadata) at one particular point in the data pipeline. While this technique is practical for in-database verifications – as tests are embedded directly in their data modeling efforts – it is tedious and time-consuming when end-to-end data […]

Testing and Monitoring Data Pipelines: Part One

Suppose you’re in charge of maintaining a large set of data pipelines from cloud storage or streaming data into a data warehouse. How can you ensure that your data meets expectations after every transformation? That’s where data quality testing comes in. Data testing uses a set of rules to check if the data conforms to […]

Data Observability vs. Monitoring vs. Testing

Companies are spending a lot of money on data and analytics capabilities, creating more and more data products for people inside and outside the company. These products rely on a tangle of data pipelines, each a choreography of software executions transporting data from one place to another. As these pipelines become more complex, it’s important […]

Machine Learning, Data Modeling, and Testing

by Angela Guess Svetoslav Marinov recently wrote in Information Management, “A friend of mine recently reminded me of the notorious quote from Frederick Jelinek (the father of modern speech recognition), “Anytime a linguist leaves the group the recognition rate goes up.” I remember being quite upset about it, back during my lnguistics studies. Is it […]