Data Virtualization (DV) is unlike traditional Data Integration, where change must be made on multiple layers; Data Virtualization makes change easy for the business as new requirements and sources can be integrated and changed rapidly.
The Data Management Association International (DAMA) Data Management Book of Knowledge (DMBOK), second edition, describes Data Virtualization as:
“Data Virtualization enables distributed databases, as well as multiple heterogeneous data stores, to be accessed and viewed as a single database. Rather than physically performing ETL on data with transformation engines, Data Virtualization servers perform data extract, transform and integrate virtually.”
In a KDnuggets article 75 Big Data Terms to Know to Make Your Dad Proud, Ramesh Dontha explains that Data Virtualization is “an approach to Data Management that allows an application to retrieve and manipulate data without requiring technical details of where [it is] stored and how it is formatted.” For example, he states that social media providers use Data Virtualization to store user photos on their networks.
In a recent DATAVERSITY® interview, Lakshmi Randall, Director of Global Product Marketing for Denodo, said that with Data Virtualization, “You don’t need large infrastructure; customers are not coming to the data – the data is coming to them.”
Denodo has specialized in Data Virtualization for over fifteen years with a philosophy of providing continuous delivery by maximizing local processing and minimizing the network moment. “Data Virtualization is the one thing that we’ve focused on, and we do it extremely well,” she states.
What is Data Virtualization?
During the interview, Randall outlined key features of Data Virtualization as being:
- Zero Replication: Data is not moved or copied. Instead, it stays where it is and is connected to other sources, no matter the location. This greatly improves the speed at which users can access data.
- Abstraction: Business users can access the data without concern about where it resides.
- Real-Time Access: As source data is updated or changed, the data is available immediately.
- Agility: Changes can be made without impacting the business. Data Virtualization facilitates a universal semantic layer across multiple consuming applications.
Because there’s no need for large infrastructure, implementation costs are low. In Predicts 2017: Data Distribution and Complexity Drive Information Infrastructure Modernization, Gartner Research agreed, predicting major savings with Data Virtualization. The report states
“By 2018, organizations with Data Virtualization capabilities will spend 40 percent less on building and managing data integration processes for connecting distributed data assets.”
In his article An Introduction to Data Virtualization and the Best Tools for 2018, Timothy King said, “with modern, distributed analytics solutions becoming the new norm, companies crave the ability to obtain a unified view of their data without having to move it.” He cites Gartner’s projection that by 2020, 35 percent of enterprise organizations will implement DV as an alternative to data integration.
What Data Virtualization is Not
There is some confusion surrounding the concept of Data Virtualization; enough so that Denodo offers a number of explanations that outline the six concepts that are often mistaken for the technology. Other industry professionals have also weighed in to clarify some additional misperceptions. According to Denodo and other experts:
- It’s not Data Visualization, although it sounds similar. According to Denodo, visualization refers to the display of data to end users graphically as charts, graphs, maps, reports, etc. Conversely, “Data Virtualization is middleware that provides data services to other data visualization tools and applications.”
- It’s not a Replicated Data Store. “Data Virtualization does not normally persist or replicate data from source systems to itself. It only stores metadata for virtual views and integration logic.”
- It’s not a Logical Data Warehouse. In a blog post entitled The Logical Data Warehouse is NOT the same as Data Virtualization, Rick van der Lans states, “The logical data warehouse architecture is, well, the word says it all, an architecture; whereas, a Data Virtualization server is a technology.” Data Virtualization may be used as part of a logical data warehouse architecture, but there are many more use cases. “Data Virtualization technology relates to a logical data warehouse architecture the way a SQL database server relates to a data warehouse or data mart.”
- It’s not only Virtualization. When the term “virtualization” is used alone, it typically refers to hardware virtualization — servers, storage disks, networks, etc.
- It’s not Data Federation. According to Denodo, “Data Virtualization is a set of capabilities that includes – but is not limited to – advanced data federation.” Rick van der Lans has said, “Data Virtualization is much more about abstraction. It’s abstracting the peculiarities of specific data sources. And there are many features required for that, data federation being one.”
- It’s not Virtualized Data Storage. Some companies and products use the exact term “Data Virtualization” to describe virtualized database software or storage hardware virtualization solutions. However, those solutions do not provide real-time data integration and data services across disparate structured and unstructured data sources.
- It’s not copy Data Virtualization. In the DATAVERSITY article Data Virtualization vs. Copy Data Virtualization, Ravi Shankar, CMO at Denodo, said that Data Virtualization is a kind of Data Integration technology, but unlike other Data Integration solutions, Data Virtualization creates integrated views of data across multiple sources without moving it. Copy Data Virtualization performs a much more specific function than Data Virtualization: It virtualizes redundant data copies across an organization to reduce storage footprints.
“Our mission is to enable enterprises to be agile, so they can be competitive and innovative with their business models,” said Randall. Organizations have different types of users, with varying skill sets, and users need access at multiple points in time, she said. “It’s very important to tailor the data to meet their needs,” so users can quickly and easily access data and use it for analysis instead of spending time looking for data, preparing data, or blending data. Business users don’t need to know what format the data is in, where it resides, or whether it’s Hadoop, or MongoDb, or SQL. “They just need access to the data that’s relevant to them, so they can do their daily analysis.”
Data Security and Data Governance are built into the platform, with personas and permissions for every type of user. Randall said that Denodo rapidly integrates and delivers data for a fraction of the cost compared to other integration approaches in the industry “and typically provides a solution 80 percent faster than other approaches as well.”
Over the years, Denodo has matured their Data Virtualization capability and they are now offering a fourth-generation platform. Starting with simple data federation, adding robust performance capabilities, and support for Cloud platforms, they now have a Data Catalog within the platform, Randall said.
With their latest release, Denodo’s main focus is still Data Virtualization, but with 7.0, in-memory parallel processing engines are supported natively. In addition, they now have a location-agnostic architecture, so Denodo is available on Azure marketplace, AWS marketplace, and Docker Container. “It doesn’t matter where your data is. Denodo is able to integrate that across all those locations,” she said.
Other new features come under the heading of automated lifecycle management. “Solution Manager is a new interface capable of streamlining a DevOps approach, especially for large deployments where there are managing licenses or for moving the release from development to QA,” Randall noted. Expansion on the support of self-service integration debuted in 6.0, with more business-friendly enhancements. With an interface for every type of user, 7.0 builds on those business-friendly features and provides opportunities for both business and IT to collaborate, she said. “For a Data Virtualization platform, that’s very unusual.”
The new release also provides active metadata by incorporating the catalog as part of the Data Virtualization platform. “The beauty of Data Virtualization is that it is tied to all the consuming applications, so we know at any given time the questions that are asked by the users,” which can then be used to make additions and recommendations that further enrich the data catalog.
In pursuit of strengthening the bridge between business and IT and bringing further benefits to line-of-business users, Randall said, “The future is more around cognitive automation, providing additional value, and self-improvement within the Data Virtualization platform.” With a multi-cloud or hybrid cloud approach, users now have the freedom to choose different Cloud providers. Randall said that Data Virtualization’s built-in, location-transparent architecture coupled with large-scale analytics architectures naturally support this functionality.
Randall stresses that because it uses a universal semantic layer, by nature Denodo’s Data Virtualization can support any new consuming applications that may emerge in the future.
“The business value really is enabling the organizations to be more agile in how they integrate their data” without compromising KPIs like performance, security, and governance, she said. “Denodo enables rapid decision-making by abstracting complexity, integrating data quickly, and provisioning that data in multiple formats so users can use their tool of choice to consume data and perform analysis.”
For case studies using Data Virtualization, see A Business Intelligence Upgrade Keeps Ultra Mobile Operating at Light Speed and Indiana University, Data Virtualization, and The Decision Support Initiative.
Image used under license from Shutterstock.com