In this blog, we will talk about what a global water company achieved with real-time data replication.
The world’s water resources are stretched beyond all previous limits, and the toughest industrial water process challenges still lay ahead. One global water technologies and solutions company (which I’ll refer to as GWT) is taking on the challenge of reducing costs, meeting environmental regulations, and preparing for an evolving future in part with data from their SAP systems.
Like thousands of companies around the world, SAP Enterprise Core Components (SAP ECC) is the core enterprise resource planning system for GWT. At present, SAP ECC, released in 2004 but still supported today, is the most commonly deployed SAP version. SAP ECC is supported on a number of different database technologies, including Oracle, SQL Server, DB2, Sybase (now owned by SAP), and SAP HANA. SAP’s release after ECC is called S/4HANA and is only supported on the SAP HANA database. SAP is working with its customers to migrate ECC deployments to S/4HANA by the end of 2027 when regular support for ECC ends. Thousands of employees and contractors use financial, accounting, inventory, and purchasing components to run the day-to-day business.
GWT uses SAP data in Oracle databases for the forecasting of purchasing, materials, and inventory to assist in improving business decisions. Forecasting can include such things as predicting when chemicals in the inventory will expire, need to be disposed of, and replaced based on government regulations. GWT manages these SAP deployments on-premises by internal IT teams. And it’s not cheap. Luckily these systems contain a goldmine of data that GWT can use through analytics and reporting to improve processes, decrease costs, and ultimately provide the world with better, cleaner water.
Drain on Resources
GWT’s SAP data didn’t always flow into the analytics systems as freely as it does today. In the beginning, analytics drained resources away from the source SAP system. This impacted daily operations and the source was designed for many small parallel transactions, not large batch analytics, so the analytic results slowly dripped out.
This clearly wasn’t scalable, so ETL processes (Business Objects Data Services — BODS) were used to periodically extract large volumes of data and transport it to another Oracle database reporting instance. The process worked well for a while, certainly better than trying to always buy a bigger box. But challenges started to surface, including:
1. Bulk extracts put a massive load on the source SAP transactional database
2. Data needed to be fresher; ETL latency was too long and only worked for historical reporting, not real-time analytics
3. Detecting deletes was labor-intensive and inconsistent, resulting in some bad data
4. Analytics on Oracle was slow because tables contained a very large number of rows and columns
GWT upgraded its analytics platform to make sure that it could handle massive data queries. While the analytics platform performed much better than the previous incarnation, it still required resources to manage and scale. A hosted cloud service, Amazon (AWS) Redshift, was chosen as a replacement because it could handle massive data queries.
Even as GWT selected some new, state-of-the-art technologies to better handle the increasing data volumes on the target system, including analytics platforms and a move to a hosted cloud service, the BODS bulk ETL extracts still had to be used to extract data from critical SAP cluster and pool tables.
Bulk ETL extracts (in this case or cases like these) are time and resource consuming. This continued to cause a heavy load on the source when extracting these tables. Also, I mentioned above, BODS bulk ETL missed data, resulting in Data Quality issues.
Here Comes Log-Based Change Data Capture
To solve the issue, GWT began exploring Change Data Capture (CDC) to extract SAP data, free up resources, and ease the challenges associated with the bulk ETL transactions. As its name implies, CDC identifies changes and can then synchronize incremental changes with another system or store an audit trail of changes. CDC comes in multiple flavors, including trigger and log-based. (See my previous article for more information on the different methods of CDC.)
GWT decided to go with a log-based CDC solution. Out of the available flavors, log-based CDC is superior since it can be applied to all possible scenarios, including systems with extremely high transaction volumes or those that zap a lot of resources from systems. It’s the approach with the lowest overhead to individual transactions, but also to the system overall.
The 6 Benefits That GWT Gained from Choosing Log-Based CDC for Their Data Replication
1. Near-Zero Overhead: Reduced load on the source SAP transactional database for all tables. Load once, stream changes only
2. Lower Latency: Change data is queued off box and updated on target once per hour, resulting in low latency between sources and targets
3. Improved Data Quality: Log-based CDC guarantees zero change data loss, including deletes and transient updates
4. Flexibility with Analytics: Log-based CDC can be used across heterogeneous platforms to enable the high-volume data centers to span on-premises and in the cloud
5. Data Trust: Since log-based CDC only detects changes, GWT can trust that important transactions aren’t missed during lengthy, bulky ETL processes
6. Security: Industry best practices around log-based CDC allow for data to be encrypted in transit
7. Scalability: Since log-based CDC is a low-impact method of data transfer, GWT can scale up without having to worry about paying for increased resources to handle larger data queries
Log-based CDC greatly simplified GWT’s data flow and eliminated the BODS ETL overhead on the source, freeing up source SAP resources. Now those resources could be allocated to users instead of to data extraction. By using log-based CDC to replicate data from their upgraded analytics platform to the cloud (AWS), GWT was able to more easily tap into the “goldmine” of SAP data, allowing for better, more accurate, and more timely analytics and reporting to make more informed and impactful business decisions.