The Next Generation Logical Data Warehouse: It’s Time to Democratize the Data

By on

Click to learn more about author Michele Iurillo.

Previously, a Business Intelligence (BI) infrastructure was something very complex where there were: data sources, normalization, ETL, OLAP Cubes and a physical Data Warehouse hosted on a physical server. Technology has grown a lot, and the scenarios are changing due to two particular factors: the Cloud and Virtualization. But that’s not all…Trends in BI talk about other very important factors that will be important in the coming months.

According to Gartner:

“Most business users will have access to self-service tools to prepare data for analysis. Most independent self-service data preparation offerings have been expanded into end-to-end analytics platforms or integrated as features into existing analysis platforms. Intelligent, governed, Hadoop-based, search-based, visual and intelligent data discovery will become a unique form of next-generation data discovery that will include self-service data preparation and natural language generation.”

Also as stated by Gartner:

“Organizations are embracing self-service analytics and business intelligence (BI) to bring these capabilities to business users at all levels. This trend is so pronounced that Gartner, Inc. predicts that by 2019, the analytical output of business users with self-service capabilities will outperform that of professional data scientists.”

Logical Data Warehouse a New Approach

What is a Logical Data Warehouse? To understand the logic behind the Logical Data Warehouse, it is necessary to examine what a Data Warehouse of a traditional enterprise actually is.

“A Data Warehouse is simply a single, complete and consistent store of data obtained from a variety of sources and made available to end-users in a way that they can understand and use in a business context.” (Source: Barry Devlin Data Warehouse: from architecture to implementation).

A Data Warehouse is a physical and unique database. Well, no! A Data Warehouse (DW) can be a representation of a heterogeneous set of data sources, each of which carries portions of the enterprise data that will be used for transactions or business analysis. The Logical Data Warehouse is an architectural style that represents data from various data sources.

In the traditional Enterprise Data Warehouse (EDW) scenario, data usually comes from transactional databases, line-of-business applications, CRM systems, ERP systems, or any other data source. This data is standardized, cleaned and transformed through an ETL process (extract, transformation, load) to ensure reliability, consistency and accuracy throughout the enterprise before it is loaded into the Data Warehouse. This process ensured a stable and secure data platform from which Data Scientists and information workers could perform complex analyses and generate informational reports.

Today, EDW is somewhat obsolete and ineffective due to the volume, variety and speed of large data coming from the Cloud, social networks, mobile devices and IoT and is spread across global sites in a multitude of formats. Add to this the assumption and expectation that all of this will be accessible, meaningful and ready to be consumed by any self-service BI application in real-time or near real-time. When an EDW project described above is implemented, it often loses its relevance to current business needs.

As a BI consultant, I have seen many well-designed projects whose implementation has been very complex and time-consuming due to the large “funnel” called ETL. Before loading processes, data normalization is another critical moment for any project.

More and more business organizations seeking to tame this avalanche of wild data are turning to a logical architecture that abstracts the inherent complexities of large data using a combined approach of Data Virtualization, Metadata Management, and distributed processing.

The Logical Data Warehouse architecture combines all these elements while including and transcending the capabilities of EDW.

The new concept of the Logical Data Warehouse will allow IT departments to discharge their tasks and responsibilities on BI-related issues. The era of the real CIO (Chief Information Officer) has finally arrived.

The logical layer provides (among other things) various mechanisms for viewing data in the DW and elsewhere in a company without relocating and transforming data before the display time. In other words, the Logical Data Warehouse complements the traditional central warehouse (and its primary function of aggregation, transformation and persistence of data a priori) with functions that search, and transform data in real time.

The advantage of the logical layer is that the data is fresher (as time-sensitive business processes require) and the structure of the supplied data is created at runtime (as required by discovery-oriented analysis), without limiting the data to pre-built DW structures. Achieving these benefits has been a challenge in the past, as software, hardware and networks simply lacked the speed, scale and reliability required of ad hoc installations were large and complex.

Data Virtualization provides a single integrated view of data from distributed sources in real-time or near real-time regardless of the type or location of the data or whether it is structured, semi-structured or unstructured. When the Logical Data Warehouse powered by a complete Data Virtualization product joins with its unparalleled distributed processing performance that pushes processing to the source system where data is waiting to be requested, whether in a Hadoop cluster, CRM system or EDW, the dance of released data begins.

The Logical Data Warehouse in Today’s Terms

The need for self-service BI in modern Data Management cannot be understated, and thus the ability to have a “self-service Logical Data Warehouse where up to 100 different sources can be connected a few minutes after installation,” is certainly important.

What does this mean in practice? The starting point necessary for such tools, according to Piotr Czarnas, is how will the analytical process looks like in large companies:

“There are many areas in business where companies want to do data analysis. About customers, the status of orders, anything. For example, you have 50 people in your organization who consume these reports in some way. If you want to generate such reports, there are two problems: firstly, buying a license for the appropriate software to load the data and a wholesaler to hold it, and secondly, buying a database.”

So, it’s not that easy.

“Even if someone buys a database and spends money on a license, it will take 6 to 8 months before the report is generated, because the data has to be uploaded into the central database. That, in turn, requires developers that cost money. Only when all this happens will the data analytics teams have where to get the data from. As a result, the report comes into being after six months, when the business has already forgotten what it asked for. That is the practice in large companies and it costs a lot.”

Data Driven Seriously

There is still a lot of talk about Data Driven Companies, but very few tools are concerned about avoiding tedious ETL processes. It is easy to look at the Front-End, but much more difficult to do in the Back-End,. Organizations need a solid Business Intelligence Front-End that can connect to the relational side of SQL. Qlikview, TARGIT, PowerBI, Tableau and even Excel can give you results from an SAP source in a few minutes and four clicks. Without ETL with the only obligation to do a Data Modeling. Tell the Front-End which fields in the table are measured and which dimensions, for example.

This give you the possibility to drive decisions using current data, whereas usually Data Warehouses only offer access to information from the past. A next generation Logical Data Warehouse will allow you to indicate the data source and set whether it will be loaded once a day, at night, in the morning, into the Cloud or anywhere else. Everyone can always have access to it, where and when they want.

Democratize Data Analysis

As a result, the employee can retrieve data from e. g. Google Analytics or Facebook, and then everything lands in one database. Of course, many of these things can be checked manually, but with us, everything is pulled out of the machine. Next generation LDW’s allow Data Scientists to manage all information without having to rely on the technological infrastructure. This is a dream… come true.

Leave a Reply