Data virtualization, in a nutshell, utilizes data integration without replication. In this process, a single “virtual” data layer is created to provide data services to multiple users and applications at the same time.
Why Data Virtualization Is a Necessity for Enterprises explains how data virtualization helps tackle data movement challenges by making a virtual dataset available in real-time for analysis or processing, while the actual data remains in the source locations. The best part of this technology is that the user does not have to deal with any technical detail of the data, such as the data’s physical location, information related to the data type, or its security and configuration settings.
Data virtualization (DV) is the technology, which, for the purpose of processing, enables collection of data from disparate data sources, locations, and formats, and then creates a “single, integrated version of the dataset” for the data users. DV also provisions an “abstracted, organized, and encapsulated view of the data” while the original data remains in source systems.
Data virtualization technology can also substitute ETL and data-warehousing capabilities in analytics or BI applications. Gartner’s Leveraging Data Virtualization in Modern Data Architectures offers clear, unbiased views about how Data Management professionals can benefit from data virtualization technology.
The Benefits of Data Virtualization Technology
For those who have just started exploring the immense potential of DV as a robust data integration technology, here are some business benefits frequently related to this technology:
- Instant access to data without any delay
- In application development, less “lead time” is required for data availability
- Minimized possibility of data duplication
- More flexibility to change data sets or applications
A recent article uses the following quote from the Data Management Book of Knowledge to describe the process:
“[D]ata virtualization enables distributed databases, as well as multiple heterogeneous data stores, to be accessed and viewed as a single database. Rather than physically performing ETL on data with transformation engines, data virtualization servers perform data extract, transform and integrate virtually.”
How Can Data Virtualization’s Help Your IT Department?
The Cisco website offers a highly customer-focused description of DV technology:
- The capability of integrating disparate data from many sources without physically moving or replicating the data, removes the possibility of data duplication and reduces storage overloads
- The capability of “abstraction” enables data access to its location or configuration details
- The capability of “real-time” access to data ensures the latest version of data is always available
- The capability of “agility” ensures the data layer is universally available for multiple users or applications at a time — removing any possibility of computing disruptions
- The capability of “unified Data Governance and Security” ensures that the source and output data is accessible via a cohesive virtual layer, which clearly exposes data redundancy or data quality issues
- Consistent Data Quality ensures high-quality analysis
- The flexibility to change data sources or application as per requirement
Recently, the concept of intelligent DV has been doing the rounds. Intelligent Data Virtualization and Freedom of Choice for Data Scientists states that intelligent DV offers some additional benefits, which include a seamless interface between the data and BI tools, easy integration of disparate datasets, enhanced query features, and “discoverability of enterprise data” by data scientists. Moreover, intelligent data virtualization provides a “virtual space” for data scientists to explore datasets.
Data Virtualization Adoption Rates: The Market Indicators
Gartner’s Market Guide for Data Virtualization projected that by 2020, 35 percent of enterprises will implement DV as a substitute for traditional data-integration techniques. Gartner stated:
“As an increasingly important part of a comprehensive data integration strategy, data virtualization is attracting renewed interest as organizations recognize its potential for a growing range of use cases.”
However, according to a Search Data Management feature, Gartner has recently predicted that “60 percent of organizations will deploy data virtualization software as part of their data integration tool set by 2020.” This new statistic related to adoption rate of DV technology is a big jump from the initial Gartner prediction of 35 percent adoption rate stated in the above guide. This sudden change in adoption rate of data virtualization technology was attributed to enterprise IT department’s struggle with physical integration of disparate data silos, such as RDBMS, NoSQL etc.
Mark Beyer, one of the authors of Gartner’s Data Virtualization Market Guide, thinks that although initially Data Management professionals were reluctant to give too much access to data through DV, the post-cloud generation has learned that “data is freely flying all over, so what’s the point in locking down the data?”
Data Virtualization Use Cases
An Introduction to Data Virtualization and Its Use Cases offered the following use cases:
- Virtual Data Warehouse: It can be set up much faster than a traditional data warehouse.
- Virtual Data Lake: It enables data collection and consolidation from disparate data bases, both traditional and nontraditional. Fundamentals of Data Virtualization explains how the virtual data warehouse and the virtual data lake are significant improvements over the traditional ETL process with “faster data access, data integration, data cleaning, and analytics tools for BI users.”
- Data Catalog: It enables fast data access for businesses, business analysts, data scientists, and BI experts without any technical information related to the data.
The Case Study Data Virtualization Seasons the Machine Learning and Blockchain Landscape for McCormick explains how the food flavoring giant McCormick is planning to use data virtualization, along with machine learning and blockchain technologies, to improve their quality control process.
Some other use cases include:
- In self-service analytics, data preparation can come with its own challenges like support for a wide variety of data types and formats, limited data transformation tools, Data Governance issues, and data sharing constraints. Data virtualization platforms remove these problems and give the user powerful tools to quickly prepare a dataset from any type of raw data source.
- Registry-style MDM, the technical details of which is explained in the Tibco whitepaper.
- Data sharing in multi-cloud environments, which again is a common problem in an increasingly cloud-based Data Management environment.
- Streaming Data Management in an IoT environment, where typically diverse data types from diverse data points (transactional and analytic processes) are collected, prepared, and shared across multiple platforms with full adherence to compliance issues.
Data virtualization has an ever-growing number of use cases to be sure. It’s become an important aspect of the entire Data Management ecosystem for many organizations worldwide.
Image used under license from Shutterstock.com