Advertisement

Modern OLAP: From Static Beginnings to a Big Data Renaissance

By on
Read more about author Chad Meley.

Online analytical processing (OLAP) enables users to interactively extract insights from complex datasets by querying and analyzing data in a multidimensional way. By structuring data by dimensions and measures, OLAP allows for intuitive and immediate slicing, dicing, and pivoting to interactively answer critical business questions. 

OLAP has come a long way since its inception. The “O” in OLAP initially referred to data being accessible online in a connected server rather than stored locally on a personal computer. While groundbreaking at the time, first-generation OLAP had significant limitations, including its reliance on inflexible, precomputed datasets that quickly became stale and outdated. As datasets grew larger in scale, the prohibitive costs of storing extra aggregated copies in the traditional OLAP manner further highlighted its shortcomings. This led to the decline of OLAP as the big data era unfolded, rendering the initial approach increasingly impractical.

Today, we are reclaiming and reimagining the term OLAP to reflect the evolution of the technology. Now standing for “Optimized Live Analytic Processing,” OLAP has transformed to address the shortcomings of its predecessor. The focus is no longer just on access but on delivering insights from live, fresh, and active data. This shift empowers businesses to make real-time decisions with confidence, leveraging the speed, scalability, and dynamism of modern analytic systems.

The History of OLAP: A ’90s Solution with Limitations

In the 1990s, OLAP emerged as a powerful tool for making sense of data sets from burgeoning back-office applications. Technically, it relied heavily on precomputing everything before queries were executed. Data engineers would construct OLAP cubes – aggregations of data designed to provide instant answers to pre-defined queries. While effective for its time, this approach came with significant drawbacks.

One major issue was data freshness. The OLAP cubes were typically built on a weekly schedule, often over a weekend. If new data arrived on Monday, Tuesday, or any other day after the cube was built, that data wouldn’t be reflected in analyses until the next build. This delay rendered the insights increasingly irrelevant in fast-moving industries.

Another drawback was the cost of storage. To support both granular queries and aggregated insights, OLAP systems needed to store detailed raw data alongside fully aggregated cubes. The resulting storage overhead was prohibitive, particularly as datasets grew in complexity and size.

Additionally, once data was aggregated into a cube, there was no ability to drill down into the underlying details. The whole idea of aggregation was to discard data to improve performance, but this came at the cost of losing granularity. If a business wanted to revisit the details behind a specific aggregate, they would have to completely rebuild the cube or query the raw data, which was often impractical.

The Decline: Big Data Outgrows Traditional OLAP

As the era of big data took off, the limitations of traditional OLAP became even more pronounced. Data volumes exploded, making precomputing everything infeasible. The rigid structure of OLAP cubes clashed with the unstructured, rapidly evolving nature of big data. Many organizations pivoted to more flexible, albeit slower, querying models like those found in data lakes and distributed SQL engines. OLAP, for a time, seemed destined for obsolescence. The leading OLAP vendors of the day, such as Hyperion Essbase and Cognos Powerplay, are on the scrap heap of legacy tech. 

The Renaissance: OLAP Reimagined for Big Data

Recent innovations have spurred a renaissance in OLAP, driven by new technologies designed to handle modern big data challenges. One pivotal advancement is the concept of materialized views. A materialized view is a precomputed result set that provides the speed of traditional OLAP while maintaining flexibility and scalability. Furthering this, partial caching represents a significant evolution in OLAP technology. Unlike traditional systems that precompute and store all data, this approach focuses on caching only the most relevant data for frequently queried patterns. This method greatly reduces storage costs and addresses the issue of stale data. By combining fresh updates with cached results, partial caching ensures real-time data freshness and delivers predictable query performance, even under demanding workloads.

A Common Thread: Scanning Less, Smarter

The common thread between old and new OLAP lies in figuring out how to scan fewer rows of data to deliver answers quickly. The older approach was relatively crude, often reducing millions of rows to thousands by discarding key pieces of cardinality. For instance, a dataset containing daily orders might be aggregated into monthly orders, cutting the volume of data by roughly 30 times (30 days in a month). However, this approach had a significant drawback: once aggregated, the daily-level data was lost, making it impossible to go back.

Modern OLAP, exemplified by innovations like partial caching, offers a more flexible solution. Instead of fully aggregating data and discarding granularity, partial caching computes and stores intermediate aggregations – for example, weekly summaries. This approach enables querying at multiple levels of granularity – daily, weekly, or monthly – from the same dataset. When querying for monthly data, instead of scanning 30 daily records, only four weekly records need to be processed. This partial aggregation strikes a balance, preserving flexibility and efficiency without sacrificing data detail.

Adoption at Breakneck Speed

The resurgence of OLAP is evident in its rapid adoption across industries. Companies like Uber, Stripe, and LinkedIn leverage modern OLAP systems to power real-time dashboards, personalized recommendations, and operational analytics. These organizations rely on Apache Pinot, a real-time distributed OLAP data store designed for ultra-fast query processing on large-scale datasets to handle millions of queries per second, ensuring insights remain actionable and up to date.

From its static beginnings in the ’90s to its dynamic, big-data-driven rebirth today, OLAP has evolved into a must-have tool for any data-driven organization.