JOIN OUR DATA ARCHITECTURE BOOTCAMP
Save your seat for this live online training and accelerate your path to modern Data Architecture – February 27-March 2, 2023.
Click to learn more about author Gilad David Maayan.
When an enterprise takes its first major steps towards implementing Business Intelligence (BI) strategies and technologies, one of the first things that needs clarifying is the difference between a Data Mart vs. a Data Warehouse. Understanding this difference dictates your approach to BI architecture and data-driven decision making.
The goal of BI is to use technology to transform data into actionable insights and help end users make more informed business decisions, whether tactical or strategic in nature. This article clearly defines both of these important terms before elaborating on their respective use cases and architectural features.
Data Mart Defined
A Data Mart is a subject-oriented data repository that serves a specific line of business, such as finance or sales. The following are some important distinguishing features of a Data Mart:
- Contains data only from sources relevant to a particular line of business or functional unit.
- The size of a Data Mart is typically in the order of tens of gigabytes.
- Typically holds only summarized data, although some Data Marts may contain full details.
- A Data Mart costs from $10,000 to set up, and it takes 3-6 months.
- The decisions driven by the tools used on a Data Mart are tactical decisions that influence a particular department’s ways of operating.
Data Warehouse Defined
A Data Warehouse is an enterprise-wide repository of integrated data from disparate business sources, systems, and departments. For more details, see this article on types of a Data Warehouse.
Here are the features that define a Data Warehouse:
- Contains data from multiple units/subject areas within a business.
- The size of a Data Warehouse is often in the order of terabytes and is at minimum in excess of 100 gigabytes.
- The level of detail stored is high, and it includes raw data, summary data, and metadata.
- Cost often exceeds $100,000 for on-premise systems, however, the cloud computing paradigm has driven costs down with the availability of Data Warehouse as-a-service.
- Business users of specific tools attempt to use Data Warehouse information to make more informed strategic business decisions that affect the whole company.
The Classic Inmon vs Kimball Debate
The importance of differentiating between Data Marts and Data Warehouses has its roots in an ongoing debate between two contrasting data modeling approaches by Data Warehouse pioneers, Bill Inmon and Ralph Kimball.
Ralph Kimball argues that the best approach is to begin with the most important business aspects or departments, from which Data Marts oriented to specific lines of business emerge. Over time, enterprises can merge their Data Marts to form a Data Warehouse as required. Kimball’s approach is known as a bottom-up approach.
Bill Inmon argues that merely combining Data Marts is not enough. Inmon advocates for the creation of a Data Warehouse as the physical representation of a corporate data model from which Data Marts can be created for specific business units as needed.
Each approach has its merits, and a number of factors influence whether you should start with Data Marts vs. a Data Warehouse, not least the industry you operate in.
For example, an insurance company clearly needs a high-level overview from the outset, incorporating all factors that affect its business model and strategic choices, including demographics, stock market trends, claim histories, statistical probabilities, etc., so taking the Inmon approach and starting with a Data Warehouse makes most sense here.
For a small to medium-sized marketing business, it makes sense to start with a Data Mart. If that business expands to include multiple sub-divisions and lines of business, it can combine its Data Marts for each business line into a Data Warehouse later on, as per the Kimball approach.
Most databases are normalized, which means they are optimized for faster transaction times, such as adding or deleting data. Normalization works by reorganizing data so that it contains no redundant data and separating related data into tables with joins between tables that specify relationships.
Data Warehouses/Marts often use a denormalized data structure, wherein the administrators take steps to improve query performance by adding back redundant data to normalized data to decrease analytic query running times.
An important concept is extract, transform, and load (ETL). ETL extracts data from several sources, transforms the data to meet business needs using certain business rules, and finally loads (writes) data into a target system.
When starting with a Data Warehouse, you’ll typically use ETL to get data directly from source systems to the Data Warehouse, and then from the Data Warehouse to Data Marts as needed. If you take the Kimball approach and begin with Data Marts, you simply write data from relevant source systems into appropriate Data Marts before performing ETL processes to create the Data Warehouse from your Data Marts.
Due to time constraints and resources, it usually makes sense for all but the most established enterprises to start with Data Marts and develop a Data Warehouse over time. However, Cloud Computing has shortened the time and reduced the cost of building an enterprise Data Warehouse, which can provide access to a single view of truth over organizational data.
Image Credit: Pixabay