Advances in Data Warehouses

By on

Data warehouses have advanced in the past few years, adding multiple enhancements and new capabilities. A data warehouse stores business data from a variety of applications and databases. It acts as a single repository, which an organization can access with BI (business intelligence) and analytics tools, before making decisions. A data warehouse provides faster processing because its use of columns in storing data allows users to quickly access only relevant data elements.

Businesses can set up a data warehouse on-premises, within a cloud, or in combination to form a hybrid cloud relationship.

Over time, a data warehouse accumulates historical records which become invaluable to business analysts and data scientists.

The belief that data warehouses are not scalable is “old information.” In the 1990s that was true, but by 2008, solutions for warehouse scalability limitations were being offered. Facebook began using a NoSQL warehousing system in 2008. By 2010, more and more data warehouses were becoming scalable. With the restriction of a limited memory removed, data warehouses began evolving in significant and useful ways.

Cloud-based data warehouses provide user-friendly controls, hiding layers of complexity. The different elements involved in data warehouses, such as databases, ELT (extraction, loading, and transformation), data mining, and analytics engines, allow users access through the use of relatively simple interfaces (in most cases).

Jerod Johnson, technology evangelist at CData Software, said:

“With different data pieces being stored in different locations, the data warehouse provides that single point of contact for your data, regardless of what kind of data it is.”

In-Database Analytics

In-Database Analytics describes a new technology that allows data processing to happen within the database. It does this by constructing the analytic logic within the database itself. In-database analytics eliminates the human factor by automatically transforming data. 

An in-database analytics system is an analytic database platform that has a data warehouse integrated into it. In-database analytics is commonly used in applications that need intensive processing. Advantages are:

  • Streamlines the identification of potential business opportunities and predicts risks and trends
  • Improves a business’s predictive analytics
  • Allows users to create reports, examine data details, and provide ad hoc analysis reporting

AI and Machine Learning Operations

Cloud data warehouses have begun using advanced machine learning algorithms in their software, which avoids certain problems common to on-premises data warehouses. The combination of machine learning algorithms and cloud-based solutions allows companies to scale and manage significant amounts of data more easily and efficiently.

The machine learning algorithms adjust quickly to new patterns, giving data teams and other staff the ability to access analytical insights and updated information about changing business processes. Easy access to insight, analytics results, and updated information can improve decision-making and business efficiency throughout the company.

Two years ago, Honeywell initiated a data warehouse strategy designed to send their transaction data from all sources to a single warehouse. Using AI, more than 100,000 contracts were reviewed automatically with a focus on inflation and pricing issues.

Additionally, the combination of a single data warehouse, combined with artificial intelligence, can help with managing inventory, as well as recycling.

Many businesses would like to improve their customer service with the use of natural language processing (NLP).  Artificial intelligence and NLP are being combined with data warehouses to provide ML training. Combining AI, data warehouses, and NLP supports a quick analysis of data, and offers opportunities for growth in marketing, sales, and support.

Data warehouses can store huge amounts of both structured and unstructured data, which can be analyzed using these AI/NLP platforms.

An analysis of this NLP data (a form of data mining) can improve real-time responses by “bots.” (Bots are software programs that perform automated, predefined tasks, such as answering questions.) Bots normally imitate or replace humans for limited nonphysical tasks. Because they are automated, they operate much faster than human users. They typically provide information in the form of live chat assistance, and use responses based on past conversations with customers. This type of data mining is challenging without the stability of a data warehouse.

Bots can help businesses handle staffing shortages or large amounts of work.

Recent Trends in Data Warehouses

As businesses attempt to both develop an understanding of their customer base and streamline their business processes, many are turning to the use of data warehouses for processing the analytics. Understanding recent advances in the evolution of data warehouses can be the difference between keeping up with the competition and falling behind. Recent advances are:

  • Shifting to a Single Data Warehouse: Several companies have developed multiple data warehouses, often using different architectures. This is expensive and may require additional staff to access, transform, and transfer the data. Consolidating all data into a single store can save money and the business can become more efficient.
  • Increased Usage of SaaS: The data-warehouse-as-a-service (DWaaS) is gaining in popularity. The freedom and cost-effectiveness of this service makes it an incredibly useful option for many organizations. The primary advantages of the SaaS model include user accessibility, security, and simplicity.

Enter the Data Lakehouse

The two main types of large-scale data storage that have been used for the last two decades are data lakes and data warehouses. The data lakehouse merges the data warehouse model with the data lake model, with the goal of resolving each model’s limitations and weaknesses.

The lakehouse system offers low-cost storage for massive amounts of raw data, just like a data lake. It also brings structure to the data, and supports Data Management by using a metadata layer on top of the storage, much like data warehouses.

The data in lakes is disorganized, and often, some it will have been stored for a few years or more, leading to the problem of data stagnation (useless data). Data warehouses, on the other hand, are less efficient and more expensive than data lakes, and cannot handle unstructured data like video, audio, and social media posts. The data lakehouse architecture resolves these issues, and supports a data platform with analytics, Data Management, data engineering, and machine learning.

This architectural arrangement allows different data teams to use one system to access all their data.

The governance feature in the data lakehouse is based on a centralized catalog with detailed access controls, and supports a user-friendly data discovery process. Data lakehouses use the same SQL interfaces as traditional data warehouses, allowing analysts to use existing SQL and BI tools without interrupting their workflows. For example, popular BI services such as Qlik, Tableau, PowerBI, and Looker can connect to lakehouse systems with relative ease.

Data lakehouses also provide direct access and support for Python and AI frameworks.

The technologies used to set up a data lakehouse are open source (for example, Hudi, Delta Lake, and Iceberg). The providers of data lakehouses include:

  • Databricks
  • Dremio
  • Starburst
  • Hewlett-Packard
  • Infor

Data lakehouses lower cloud costs by eliminating the need for expensive data warehouses, ETL/ELT workflows, extensive DataOps, piecemeal security, and Data Governance, giving you peace of mind. So, unlike data warehouses, the lakehouse system can store and process lots of varied data at a lower cost, and unlike data lakes, that data can be managed and optimized for SQL performance.

There is a strong probability data lakehouses will replace data lakes and data warehouses for most analytics projects.

Image used under license from

Leave a Reply