You are here:  Home  >  Data Education  >  Big Data News, Articles, & Education  >  Big Data Blogs  >  Current Article

Big Data Doctrine: Warehouse vs. Data Lakes

By   /  December 2, 2015  /  No Comments

Learn more about author Thomas Hazel.


It’s been five years since the term “Data Lake” was coined/credited to James Dixon. Since then the concept has continued to evolve and gain momentum. Data Lakes are the new kids on the block with respect to Big Data management and a stark contrast to traditional Data Warehouse solutions businesses have been employing since the 1980s. Recently this contrast, to say the least, has had several narratives: good versus evil, wrong side of history, etc. Which side you’re on depends on many factors. The objective of this column, and subsequent columns, will be to focus around debunking some of the FUD (on both sides) and begin to have a sophisticated conversation on the complexities of Big Data management and where we go from here.

Though to understand where we’re going, you need to know where we came from. This is no different with respect to the warehouse versus lake discussion. What makes technology a true “trend,” and what makes a new trend supersede the next, typically follow a similar path: Hype, Belief, Adaption, Duration. History has shown that many of the more notable trends had a tough go of it before general acceptance was achieved. In other words, megatrends (life-changing) such as “Cloud” and “Big Data” had many a naysayer before they finally changed how we do business and, more significantly, how we live our lives. But why? Is there a reason some skeptics push so hard against new and different? Sure, false hopes are just that- a waste of time and money, while others are the real deal solving real problems. This column will work to outline the reasoning for why some technologies become trends while others wither on the vine, and ultimately predict whether data lakes are a “fad” or a trend handling today/tomorrow’s tsunami of data.

Without a doubt, traditional warehousing has been a trend for thirty years. That is to say it has all the hallmarks of adoption and duration to back it up. So why are data lakes entering the conversation and seen as a possible threat to warehouse providers? This is a many faceted question and, in part, will be use to answer the overarching question: Why do some technologies become trends while others become fads?

In the end, there are really two reasons a technology becomes a trend: Adoption and Duration. Hype and Belief are simply the buildup before the last two take hold. Hype or more specifically, Hype Cycle is a term Gartner uses to predict adoption of any particular technology and a mainstay for any industry.

There are five phases in this cycle (the curve) that are superimposed on a graphical chart:

  1. Technology Trigger – where the new technology first appears
  2. Peak of Inflated Expectation – where early publicity lifts user expectation
  3. Trough of Disillusionment – where adoption issues are experienced
  4. Slope of Enlightenment – where initial issues are worked through
  5. Plateau of Productivity – where the new technology is adding value

sample graph

This chart is part of an overall report Gartner releases to the public (paid research), and for any technologist, a must read. However, it does not go into details on adoption fundamentals, or for that matter, why a technology fell off the chart (i.e., duration). In other words, what was the catalysis for a technology to be put on the list and why will it or won’t it fall off the list in next year’s report. Just looking over the last five years of reports, new technologies have appeared at any phase of the cycle and then disappeared just as quickly. These reports are great at snapshotting the current state of hype and adoption, but whether they’re a predictor of trends, well not so sure.

This and subsequent columns will go directly to trend predictability. In other words, if your future technology decision is based on Gartner’s hype cycle, you might make an ill-advised bet, and with big data requiring long -term decisions with long-term consequences, a misguided decision could be costly.

As mentioned, adoption and duration are the key metrics in making good long-lasting decisions. The following Harvard Business Review articles provide a more in-depth reasoning on why certain products and technologies make it into the mainstream, as well as, why a smaller set have a longer shelf life.

These articles use statistical research to outline the power of disruptive technology and how it affects and changes markets. How it does so is somewhat complicated, but if one reads them from start to finish, one will see how disruptive technology is the driving force of change. Sometimes redefining markets and sometimes replacing, and sometimes creating whole new sectors (e.g. megatrend).

Now with a stronger foundation of what drives technology adoption and duration, the pros and cons of warehousing versus lakes can be discussed. In my next column (Pros and Cons: Warehouse vs. Data Lakes), I’ll go into the thick of things and hopefully provide concrete recommendations with respect to when/if your Big Data management project should stay with warehousing or consider lakes or both.

About the author

Thomas Hazel, founder, chief scientist and CTO at Deep Information Sciences, is an avid inventor and serial entrepreneur. Over the last 20 years, he has been at the forefront of communication, virtualization, and database science and technology. Prior to founding Deep Information Sciences, he was Chief Architect at startups Akiban and Virtual Iron. Thomas is also the author of several popular open source projects, one of which is a database he sold to Oracle. Thomas has patented many inventions in the areas of distributed processes, virtualization and database science. He holds a degree in Computer Science from the School of Engineering at the University of New Hampshire, and founded the UNH and Deep chapters of the Association for Computing Machinery.

You might also like...

Automated Data Catalogs Allow Rethinking of Data Policies

Read More →