A new press release reports, “StreamSets®, provider of the industry’s first DataOps platform, today announced an expansion of its partnership with Databricks by participating in Databricks’ newly launched Data Ingestion Network. As part of the expanded partnership, StreamSets is offering additional functionality with a new connector for Delta Lake, an open source project that provides reliable data lakes at scale. With it, users can configure their pipelines to write data from any source moving in batch or streaming mode directly into Delta Lake. Now, data teams can deliver all of their data in a shorter time frame, driving BI, analytics and ML. Today, companies require systems for diverse data applications like real-time monitoring, machine learning and data science — and that can process unstructured data like text, images, video and audio. A decade ago, data lakes replaced data warehouses as the best repositories for this raw data; however, they neither support transactions nor enforce data quality. In addition, they lack consistency, making it almost impossible to mix batch and streaming jobs and appends and reads.”
The release continues, “Leveraging the best of data warehouses and data lakes, lakehouses remedy the above limitations, but friction ingesting fresh data remains. With this partnership, Databricks users will now be able to capitalize on the new lakehouse paradigm without the friction previously encountered. They can easily connect into StreamSets Cloud and leverage out-of-the-box connectors to load batch, change data capture (CDC) or streaming data from any source (such as cloud applications, relational data, on-premises data lakes and warehouses) into Delta Lake. With StreamSets, data engineers can easily build and operate data pipelines for modern and legacy data sources to migrate to a lakehouse and continuously refresh with relevant data.”
Read more at Globe Newswire.
Image used under license from Shutterstock.com