Data lakehouses promised a unified, scalable future for analytics and artificial intelligence (AI). But for many organizations, that promise is breaking down. Slow queries, siloed systems, and rising cloud costs are turning potential into pain. Why? Because storage alone isn’t an effective strategy.
Data lakes combine the best features of data lakes (infinite storage, with support for unstructured data) with the best features of classic data warehouses (advanced support for structured data). But on their own, data lakehouses cannot effectively support many of today’s demands: AI-ready data, operational data management, and true self-service data access for business users. Optimization is no longer an aspiration. It’s an essential ingredient.
AI-Ready Data
AI applications and generative AI (GenAI) applications are extremely data-driven, in the sense that the quality and accuracy of their responses is almost completely dependent on the quality and accuracy of their available data. Garbage-in-garbage-out applies here, and in a very direct way. AI applications need data that is not only accurate and appropriate to the task at hand, but also data that is as up-to-date as possible. For an AI application to be able to leverage data immediately in response to a prompt, it needs data that is accurate, trusted, governed, free from data shortage, and delivered in real-time, across the many different kinds of applications and data sources that may be available to it.
Data lakehouses cannot easily support AI-ready data because some of the data that an AI application might need is stored outside. This could be because of legacy systems that haven’t been integrated yet, multiple cloud systems that a company is managing, due to the different advantages offered by different vendors, or because one or more data sources are governed by data privacy laws that restrict exportation. A single data lakehouse might also contain multiple applications with their own proprietary data sources. For this reason, data lakehouses cannot deliver data in real time, as it would often first need to be integrated and semantically unified, which simply takes time. This is because extract, transform, and load (ETL) processes, which data lakehouses rely on for data integration, deliver data in scheduled batches. Semantic unification takes time because it requires manual development; lakehouses, “out of the box,” cannot automatically transform data as required, in real time.
Operational Data
Like AI-ready data, operational data is inherently real-time. It does not always have to be processed in real time, but because it controls the interaction of many moving parts, real-time delivery is a critical requirement. Consider applications such as digital twins, weather-dependent processes, or smart-factory-management.
Data lakehouses cannot easily support operational use cases for the same reasons that they cannot easily enable AI-ready data. They manage data that cannot be integrated in real time, in a semantically unified fashion.
True Self-Service Data Access for Business Users
Many interfaces, services, and applications are advertised as being “self-service,” yet they still require users to perform a few steps, or possess a little database administration, SQL queries, or other technical knowledge. The dream, however, is for business users to engage with data just as easily as they engage with an online store. And if they didn’t know exactly what they were looking for, they could ask a bot for assistance.
This type of self-service data access requires, once again, real-time data that is semantically unified. Business users need to be able to access data in their own language, not “database speak.” And if they need a bot’s assistance, this brings us back to AI-data – it requires governed, trusted data delivered in real time. These abilities, once again, are not natively available to data lakehouses.
Logical Data Management for the Data Lakehouse
If organizations were to add logical data management layers to their data lakehouses, they would enable their lakehouses to deliver governed, trusted data from within the lakehouse and beyond, in real time. With a logical data management layer, lakehouses can easily support AI-ready data, operational use cases, and self-service data access for business users. Let’s explore how this works.
But before I dive into that, I want to summarize the five essential business benefits that logical data management provides:
- Boosted Performance: Quicker queries and faster data processing
- Lower Costs: More efficient use of storage and compute resources
- Stronger Data Governance: Improved quality, consistency, and compliance
- Seamless Scalability: Effortless growth to handle larger, more complex workloads
- Expanded Flexibility: Support for a wider range of analytics and AI-driven use cases
Logical data management enables organizations to manage data without first having to replicate it into a common repository. Replication is sometimes necessary, but requiring replication for every data-centric project is not only time-consuming, since the data needs to be delivered via scheduled batches, but also costly, as the data needs to be housed in multiple locations. Using data virtualization, logical data management creates views of data that are independent of the underlying storage technology, yet change immediately in response to changes at the storage layer. This architecture establishes a real-time data-access layer above the underlying data sources, capable of handling high data volumes that abstracts data consumers, including applications, from the complexities of accessing the individual sources. To query data from the sources, consumers need only query the logical layer, which retrieves views of the required data in real time. This separation from the logical layer and the physical layer means that logical data management can easily scale to accommodate any data source, including the hyperscaling cloud systems.
This architecture enables two powerful benefits: A universal semantic layer that can automatically translate the required data into the terminology required by the end-user or application, and end-to-end data governance, data quality and security, across all applicable data sources, without impeding real-time access. With these two benefits, organizations can support a much wider range of analytics and AI-driven use cases than they can with a data lakehouse alone.
For AI-ready data, logical data management layers enable organizations to easily add the business-friendly metadata that enables AI applications to understand data and use it appropriately.
There is one additional benefit that a logical data management layer brings to a data lakehouse that is worth mentioning here: Because it can manage diverse data in real time, it can also implement granular control over usage, to more effectively manage expenses, which is especially important in cloud environments.
Case Study: An Energy Technology Company
A U.S.-based energy solutions company specializing in digitally powered, advanced technologies, used a data lakehouse as the company’s central data platform. But although the data lakehouse was powerful in many areas, select workloads, dedicated to specific SaaS applications, were not sufficiently performant, so the company implemented a logical data management layer above the company’s data lakehouse. By adding a logical data management layer, the company:
- Accelerated SaaS workloads without exposing sensitive data
- Applied dynamic access control and usage monitoring
- Transformed data on the fly for app-specific formats
- Cached selectively to optimize performance and cost
This resulted in faster insights, stronger control, and a more flexible data environment.
The Next-Generation Data Lakehouse
The future of the data lakehouse architecture is not confined to storage. It’s an improved strategy, delivered at speed. When a logical data management layer is added to a data lakehouse, the lakehouse becomes capable of quickly supporting the many and varied historical and operational use cases of today and tomorrow. It will be empowered by a strategy that results in faster insights, more effective decisions, and fewer compromises.
Data Architecture Bootcamp
Join us for three days of practical exercises in data architecture design – January 20-22, 2026. (Save 20% with code HOLIDAY2025 through January 4!)


