Big Data Doctrine: Warehouse vs. Data Lakes

By on

Learn more about author Thomas Hazel.

Warehousevslake-img

It’s been five years since the term “Data Lake” was coined/credited to James Dixon. Since then the concept has continued to evolve and gain momentum. Data Lakes are the new kids on the block with respect to Big Data management and a stark contrast to traditional Data Warehouse solutions businesses have been employing since the 1980s. Recently this contrast, to say the least, has had several narratives: good versus evil, wrong side of history, etc. Which side you’re on depends on many factors. The objective of this column, and subsequent columns, will be to focus around debunking some of the FUD (on both sides) and begin to have a sophisticated conversation on the complexities of Big Data management and where we go from here.

Though to understand where we’re going, you need to know where we came from. This is no different with respect to the warehouse versus lake discussion. What makes technology a true “trend,” and what makes a new trend supersede the next, typically follow a similar path: Hype, Belief, Adaption, Duration. History has shown that many of the more notable trends had a tough go of it before general acceptance was achieved. In other words, megatrends (life-changing) such as “Cloud” and “Big Data” had many a naysayer before they finally changed how we do business and, more significantly, how we live our lives. But why? Is there a reason some skeptics push so hard against new and different? Sure, false hopes are just that- a waste of time and money, while others are the real deal solving real problems. This column will work to outline the reasoning for why some technologies become trends while others wither on the vine, and ultimately predict whether data lakes are a “fad” or a trend handling today/tomorrow’s tsunami of data.

Without a doubt, traditional warehousing has been a trend for thirty years. That is to say it has all the hallmarks of adoption and duration to back it up. So why are data lakes entering the conversation and seen as a possible threat to warehouse providers? This is a many faceted question and, in part, will be use to answer the overarching question: Why do some technologies become trends while others become fads?

In the end, there are really two reasons a technology becomes a trend: Adoption and Duration. Hype and Belief are simply the buildup before the last two take hold. Hype or more specifically, Hype Cycle is a term Gartner uses to predict adoption of any particular technology and a mainstay for any industry.

There are five phases in this cycle (the curve) that are superimposed on a graphical chart:

  1. Technology Trigger – where the new technology first appears
  2. Peak of Inflated Expectation – where early publicity lifts user expectation
  3. Trough of Disillusionment – where adoption issues are experienced
  4. Slope of Enlightenment – where initial issues are worked through
  5. Plateau of Productivity – where the new technology is adding value

sample graph

This chart is part of an overall report Gartner releases to the public (paid research), and for any technologist, a must read. However, it does not go into details on adoption fundamentals, or for that matter, why a technology fell off the chart (i.e., duration). In other words, what was the catalysis for a technology to be put on the list and why will it or won’t it fall off the list in next year’s report. Just looking over the last five years of reports, new technologies have appeared at any phase of the cycle and then disappeared just as quickly. These reports are great at snapshotting the current state of hype and adoption, but whether they’re a predictor of trends, well not so sure.

This and subsequent columns will go directly to trend predictability. In other words, if your future technology decision is based on Gartner’s hype cycle, you might make an ill-advised bet, and with big data requiring long -term decisions with long-term consequences, a misguided decision could be costly.

As mentioned, adoption and duration are the key metrics in making good long-lasting decisions. The following Harvard Business Review articles provide a more in-depth reasoning on why certain products and technologies make it into the mainstream, as well as, why a smaller set have a longer shelf life.

These articles use statistical research to outline the power of disruptive technology and how it affects and changes markets. How it does so is somewhat complicated, but if one reads them from start to finish, one will see how disruptive technology is the driving force of change. Sometimes redefining markets and sometimes replacing, and sometimes creating whole new sectors (e.g. megatrend).

Now with a stronger foundation of what drives technology adoption and duration, the pros and cons of warehousing versus lakes can be discussed. In my next column (Pros and Cons: Warehouse vs. Data Lakes), I’ll go into the thick of things and hopefully provide concrete recommendations with respect to when/if your Big Data management project should stay with warehousing or consider lakes or both.

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept