Extract, Transform, and Load (ETL) describes the process of integrating raw data from various data sources into a repository such as a data warehouse, with the main purpose of maintaining Data Quality and trust. ETL requires three operations, as described below by Paul Varley:
Extract: “Getting a copy of data from a source, which could be an application, database, or text file.”
Transform: “Translates the source data to match the format of the target system. This includes changing data types, combining, or splitting fields, and applying more complex formulas.”
Load: “Completes the process by putting the transformed data in the target system.”
While ETL technologies have been popular since the 1970s, ETL as a strategy plays an important role in modern times with new technologies.
Beware of introducing unintended semantic drifts when employing ETL. As data is transformed, the meaning and the context of that data changes. This could cause issues in understanding the original purpose of the data or prevent staying current with new technologies or contexts.
Other Definitions of ETL Include:
“A process of connecting to data sources, integrating data from various data sources, improving Data Quality, aggregating it and then storing it in staging data sources or data marts or data warehouses for consumption of various business applications including BI, analytics, and reporting.” (Kartik Patel)
A means to “load data from different systems into a data warehouse for reporting and data analytics.” (Paul Varley)