Advertisement

A Short History of Data Warehousing

By on

by Paul Williams

The relational database revolution in the early 1980s ushered in an era of improved access to the valuable information contained deep within data. Still improvements were needed. It was soon discovered that databases modeled to be efficient at transactional processing were not always optimized for complex reporting or analytical needs.

In fact, the need for systems offering decision support functionality predates the first relational model and SQL. Market research and television ratings magnate, ACNielsen provided clients with something called a “data mart” in the early 1970s to enhance their sales efforts.

But the practice known today as Data Warehousing really saw its genesis in the late 1980s. An IBM Systems Journal article published in 1988, An architecture for a business information system, coined the term “business data warehouse,” although a future progenitor of the practice, Bill Inmon, used a similar term in the 1970s.

The abstract for the IBM article perfectly describes the problem and ultimate solution that spawned today’s modern data warehousing industry:

“The transaction-processing environment in which companies maintain their operational databases was the original target for computerization and is now well understood. On the other hand, access to company information on a large scale by an end user for reporting and data analysis is relatively new. Within IBM, the computerization of informational systems is progressing, driven by business needs and by the availability of improved tools for accessing the company data.”


[and]

“It is now apparent that an architecture is needed to draw together the various strands of informational system activity within the company. IBM Europe, Middle East, and Africa (E/ME/A) has adopted an architecture called the E/ME/A Business Information System (EBIS) architecture as the strategic direction for informational systems. EBIS proposes an integrated warehouse of company data based firmly in the relational database environment. End-user access to this warehouse is simplified by a consistent set of tools provided by an end-user interface and supported by a business data directory that describes the information available in user terms.”

 In addition to Big Blue’s innovations, the onset of the 1990s saw two industry pundits gear up for further advances in the nascent world of Data Warehousing.

Bill Inmon, the Father of Data Warehousing

Considered by many to be the Father of Data Warehousing, Bill Inmon first began to discuss the principles around the Data Warehouse and even coined the term in the 1970s, as mentioned earlier. In 2007, Inmon was named by Computerworld as one of the “Ten IT People Who Mattered in the Last 40 Years.”

Throughout the latter 1970s into the 1980s, Inmon worked extensively as a data professional, honing his expertise in all manners of relational Data Modeling. Inmon’s work as a Data Warehousing pioneer took off in the early 1990s when he ventured out on his own, forming his first company, Prism Solutions. One of Prism’s main products was the Prism Warehouse Manager, one of the first industry tools for creating and managing a Data Warehouse.

In 1992, Inmon published Building the Data Warehouse, one of the seminal volumes of the industry. Currently in its fourth edition, the book continues to be an important part of any data professional’s library with a fine-tuned mix of theoretical background and real-world examples.

Later in the 1990s, Inmon developed the concept of the Corporate Information Factory, an enterprise level view of an organization’s data of which Data Warehousing plays one part. His website dedicated to the CIF serves as a repository for Inmon’s writing and white papers on all aspects of the data profession.

Inmon’s approach to Data Warehouse design focuses on a centralized data repository modeled to the third normal form. Inmon feels using strong relational modeling leads to enterprise-wide consistency facilitating easier development of individual data marts to better serve the needs of the departments using the actual data. This approach differs in some respects to the “other” father of Data Warehousing, Ralph Kimball.

Ralph Kimball and his Data Warehouse Toolkit

While Inmon’s Building the Data Warehouse provided a robust theoretical background for the concepts surrounding Data Warehousing, it was Ralph Kimball’s The Data Warehouse Toolkit, first published in 1996, that included a host of industry-honed, practical examples for OLAP-style modeling. Kimball’s book was this author’s “go to” volume when working on a Data Warehouse project for a financial services company in the late 1990s.

Kimball’s early career in IT in the 1970s was highlighted by work as a key designer for the Xerox Star Workstation, commonly known as the first computer to use a mouse and windowed operating system. In the 1980s, he gained exposure to decision support systems as a Vice President for Metaphor Computer Systems. A full-fledged Data Warehouse application served as a major product in Kimball’s own company, Red Brick Systems, founded in 1986.

Red Brick was known for its relational model suitable for high speed Data Warehousing applications. Kimball left Red Brick in 1992 to start his own consultancy, Ralph Kimball Associates which is now part of the Kimball Group. His well-regarded series of Data Warehouse Toolkit books soon followed. Additional volumes in the series focus on related topics, like web-based Data Warehousing, ETL in a Data Warehousing environment, as well as Microsoft-specific editions that cover SQL Server and the Microsoft Business Intelligence Toolset.

Inmon vs. Kimball – Differing Attitudes towards Enterprise Architecture

As the practice of Data Warehousing matured in the 21st Century, a schism grew between the differing architectural philosophies of Inmon and Kimball. Even calling it a schism might be overstated, as Inmon in the foreword for The Data Warehouse Toolkit called Kimball’s seminal work “…one of the definitive books of our industry. If you take the time to read only one professional book, make it this book.”

As mentioned earlier, Inmon champions the large centralized Data Warehouse approach leveraging solid relational design principles. His Corporate Information Factory remains an example of this “top down” philosophy.

Kimball, on the other hand, favors the development of individual data marts at the departmental level that get integrated together using the Information Bus architecture. This “bottom up” approach dovetails nicely with Kimball’s preference for star-schema modeling.

Both approaches remain core to Data Warehousing architecture as it stands today. Smaller firms might find Kimball’s data mart approach to be easier to implement with a constrained budget. Dimensional modeling in many cases is easier for the end user to understand, another benefit for small firms without an abundance of data professionals on-staff.

Data Warehousing in the 21st Century

Many of the current changes in today’s data industry also affect Data Warehousing. Cloud storage and high-velocity, real-time data analysis being two obvious factors playing a role in the practice’s evolution. On the end-user side, web-based and mobile access to decision support or reporting data is a major requirement on many projects. Advances in the practice of ontology have enhanced the capabilities of ETL systems to parse information out of unstructured as well as structured data sources.

Obviously, the broad term known as “Big Data” also plays its role in today’s modern Data Warehousing practice, with industrial strength Data Warehouses growing to serve large enterprises. As compliance becomes more important in the wake of the Sarbanes-Oxley Act, data quality and governance has grown in relevance concerning the management of Data Warehouses.

Ultimately, like any aspect of the overall Data Management practice, Data Warehousing depends highly on solid enterprise integration. Whether an organization follows Inmon’s top-down centralized view of warehousing, Kimball’s bottom-up star-schema approach, or a mixture of the two, integrating a warehouse with the organization’s overall Data Architecture remains a key principle.

As the Data Warehousing practice enters the third decade in its history, Bill Inmon and Ralph Kimball still play active and relevant roles in the industry. Their seminal work in the 80s and early 90s largely defined a sector of the data profession that continues to evolve today.

Leave a Reply