Click to learn more about author Eva Murray.
What Are the Challenges?
From flexibility to scalability and efficiency, using a data vault as your Data Modeling approach has many benefits. But simultaneously there are challenges that you need to be aware of. In this blog I’m going to walk you through the limitations and how you can overcome them.
The approach a data vault takes when modeling data (something I will go into detail on further down) results in a significantly larger amount of data objects compared to other approaches. These objects include things like tables and columns and the reason there are so many more is because a data vault separates information types.
As a consequence, the up-front modeling effort can be larger to accommodate the resulting benefits – mentioned above – as the end result. It also means that during the modeling process there can be larger numbers of manual or mechanical tasks involved to establish the flexible and detailed data model with all its components.
How Can These Limitations Be Addressed?
To avoid time-consuming manual tasks during the modeling process, architects can automate parts of the model, making it more efficient to create, update and maintain long-term.
How can they do that?
Within the data vault approach, there are certain layers of data. These range from the source systems where data originates, to a staging area where data arrives from the source system, modeled according to the original structure, to the core data warehouse, which contains the raw vault, a layer that allows tracing back to the original source system data, and the business vault, a semantic layer where business rules are implemented. Finally, there are data marts, which are structured based on the requirements of the business. For example, there could be a finance data mart or a marketing data mart, holding the relevant data for analysis purposes.
Out of these layers, the staging area and the raw vault are best suited to automation.
What Are the Characteristics of Data Vault Modeling?
The data vault modeling technique brings ultimate flexibility by separating the business keys, which uniquely identify each business entity and do not change often, from their attributes. These results, as mentioned earlier, in many more data objects being in the model, but also provides a data model that can be highly responsive to changes, such as the integration of new data sources and business rules.
The basic structure of the model comes from the business keys and the relationships between them. Their stable nature provides the key ingredient for a robust data model, but also means the keys need to be chosen carefully, as they form the very basis from which everything else is derived.
The tables which contain the business keys are called hubs in the data vault approach. In addition to storing the keys, hubs also contain surrogate keys and metadata for each business key. Finally, the source of each business key can also be found in the hub, so that information can be traced back to its origins.
Link tables are many-to-many join tables that connect different business keys. Within link tables the information you will find are the surrogate keys for the hubs connected via the link, as well as the surrogate key for the link and the metadata about where the association originated from.
With the hubs and links in place, the structure of the data vault model is set up. It does not, however, contain any attributes yet. This is where satellites come in. Satellite tables hold metadata that connect them to their parent hubs and link tables. They also contain metadata about the origins of the attributes, as well as temporal attributes. This means that thanks to satellites, data architects can ensure that history is recorded at any interval, while also providing an audit trail and traceability to the source system.
How Does a Data Vault Work?
In you have a database that enables you to work flexibly with a plethora of tools and methodologies so you can choose the right approach for your business and overarching analytics strategy.
We fully support you in choosing the data modeling technique that best fits your strategy. This means, you can easily benefit from the advantages a data vault brings.
Partners such as Datavault Builder and Wherescape have created data modeling and warehouse automation tools that integrate effortlessly with the database.
You can also build your data model directly in our database, using the UDF Framework.
Bringing Performance to Your Data Vault Modeled Data
Modeling your data in a data vault can result in complex SQL queries being executed in your data warehouse. Our architecture and pure design ensures that the outstanding performance we promise to you is sustained throughout the entire data lifecycle, and that includes your data modeling and warehousing processes.
You can audit and reproduce historical query results quickly and efficiently, while also loading all your large data volumes into the warehouse and invite your analysts and data scientists to run their workflows, analyses and analytical models directly in the data warehouse without sacrificing speed and reliability.
Partnerships focus on improving the user experience, which is why feedback and joint work on the continuous development and integration of products is key.
You can watch this video to get an impression of a partnership with a team using a data vault effectively.