The Evolution of Data Virtualization: From Data Integration to Data Management

By on
Read more about author Ravi Shankar.

Data virtualization was first introduced two decades ago. Since then, the technology has evolved considerably, and the data virtualization of yesterday bears little resemblance to the data virtualization of today. This is due to several facts beginning with the limitations of legacy infrastructure, the massive amounts of structured and unstructured data that organizations were collecting, and the shift from merely integrating the data to remove silos to today’s mission of managing it so it can be used to its fullest. Let’s start with the first issue.

The Limitations of Legacy Infrastructure and How to Overcome Them

Data virtualization, which provides real-time access to data without physical replication, was originally offered as an alternative to traditional batch-oriented data integration solutions, which physically collected data into a single monolithic repository. The data warehouse became the preferred single repository for analytical data, with operational data stores serving this function for operational and transactional data. However, with the rising volumes of unstructured data, data lakes became the promising new repository that, unlike earlier solutions, could support both structured and unstructured data. 

Soon, however, companies began to realize that data lakes did not eliminate all of their data challenges. True, the data was all in one place, but it was stored in multiple formats, effectively siloed. Wise companies and analysts began to realize that some data was always going to be siloed, and they looked to technologies like data virtualization, which could connect to diverse data sources without having to first collect it into a single place.

In this way, data virtualization employs a logical data integration principle, rather than a physical, monolithic one. Gartner says, “Enterprise and software architects are familiar with the myth that monoliths are simpler. More often than not, monoliths are inherently complex and fragile due to unnecessary dependencies. As architectural principles, modularity and decoupling don’t seem to be compatible with a single platform for everything.” As companies realize the ultimate unsustainability of trying to physically unify data in a single, monolithic repository, the logical principle is rapidly gaining steam.  

From Data Integration to Data Management

In addition to data integration, data virtualization now supports many aspects of data management, including metadata management, data catalogs, unified semantics, security, and data governance. It is also used to power newer architectures, such as data fabric, turning it into logical data fabric, and as a foundation for data mesh, a distributed, decentralized data management design concept that was born out of the shortcomings of traditional centralized architectures.

The popularity of data virtualization is bringing newcomers to market, but these vendors still focus on the data integration aspects of data virtualization, and it will be many years before they can deliver on data virtualization’s data management capabilities. If a data virtualization vendor claims to offer “all” data virtualization capabilities, or “modern” data virtualization supported by massively parallel processing (MPP) capabilities, it’s worth taking a closer look. Chances are, the vendor is unable to offer many of the newer data management capabilities, such as support for active metadata, advanced query optimization, or a unified security layer across disparate data sources. 

True data virtualizations have evolved to the level of advanced data management supporting emerging technologies such as platform-as-a-service (PaaS) deployments, logical data fabric, and data mesh.  

After all, analytics and operations are only as good as the data they can leverage and companies need to see all their data without boundaries. Whether they are trying to understand their customers’ behavior, are moving to the cloud, or analyzing their sensor data for better service delivery, modern data virtualization provides real-time, holistic views of their data without moving the data, thanks to strong data integration, data management, and data delivery capabilities. More importantly, it allows companies to jump light years ahead in maximizing the power of their data. 

Leave a Reply