Let’s add another “V” into the Big Data pot of volume, variety, velocity, and veracity.
This time, it’s Data Virtualization. The topic is getting more attention on the heels of this spring’s Forrester Enterprise Data Virtualization Wave report, which defines the market as hot. It describes the technology as an Agile integration platform that orchestrates data in real time or near real time from disparate data sources (whether on-premise or in the Cloud) into coherent self-service data services to support various use cases and workloads including extreme transactions, analytics, and predictive analytics. Leaders in the sector, according to Forrester, include vendors such as Denodo Technologies, Informatica, and SAP, where products have become more mature in critical features from performance to scalability to security,
In a report published last year and sponsored by Cisco – also a player in the space through its acquisition of Composite Software – Radiant Advisors points out that Data Virtualization is compelling. It makes it possible to avoid longer data integration development lifecycles that are riskier to meet fast-paced business needs. It also gets credit for driving speed to market and deployment, and supports the changes that occur on-the-fly in the organization, while enabling self-service user access and data navigability.
According to Forrester, 65 percent of Fortune 500 companies are doing Data Virtualization today, with the technology seeing broad acceptance in markets beyond those where it’s traditionally had a home (such as financial services, telco, and government) into sectors such as manufacturing, retail, healthcare and high tech. Radiant adds the perspective that challenges remain to even broader adoption, however, citing issues such as building a business case that can articulate the value of Data Virtualization in terms of speed of integration, alongside the ability to manage ever-growing amounts of data in a timely, cost-efficient way.
Data Virtualization has its roots in the concepts of Enterprise Information Integration (EII) and data federation, says Denodo CMO Ravi Shankar, but is a far broader technology. The company has about 250 customers, including some very large familiar names, and has been around for more than 16 years, so it’s quite familiar with the journey from where Data Virtualization began to where it is now. In its earlier EII or data federation incarnations, he says, the concept was that data would be gathered across multiple sources and given to the consuming application or user, but there was no notion of data quality or governance.
“Data Virtualization actually tries to apply some transformation or curation to the data,” he explains, including data quality rules for de-duplication and consistency, and it attends to governance so that those with the right levels of security can access what they need. “It goes beyond federating the data into the data quality and governance aspect,” he says.
With Data Virtualization’s three-layer architecture, data from various internal and external sources (applications, Cloud apps, Hadoop platforms like Cloudera) originating in multiple formats comprise the bottom layer. Data consumer services sit at a top layer. In between them exists a virtual layer that acts and feels like a single repository of data, as information has been related, de-duped, combined, and curated to deliver a single version of the truth. In this model, very different worlds can come together to the enterprise’s benefit: NoSQL distributed databases living in a Hadoop software ecosystem can connect to traditional data warehouses to form that single (virtual) repository. Enterprises, for example, can gain marketing advantages by taking their internal knowledge of customer purchases stored in the latter, with the high volumes of clickstream data about Web page visits or social media sentiment data flowing into the former, to better understand clients’ interests in products and then act on that knowledge.
Data consumers like marketing or other business execs don’t know – and don’t need to know – where the data actually physically exists. They don’t need to function as data mediators, nor do they need to ask IT to do the same to achieve a holistic view of information. “All the information from multiple different systems appears as if from a single place with Data Virtualization technology, and the information is real-time,” Shankar says, published to consuming systems like reporting applications in order to answer user queries. As transactions happen in the bottom layer – a customer buys something and it’s recorded in the point-of-sale system, for example – anyone querying about customer purchases at the top gets that included with the answer immediately, he says. “Connect, combine, publish” is how he explains the process.
One of the capabilities that seem poised to bring more customers in more verticals to the Data Virtualization table is not having to deal with data latency. Shankar notes, as an example, a Denodo customer in the pharmaceuticals industry that has long been using data warehouse technology to store nightly batch sales updates. But it wanted something faster, something that would enable it to know real-time sales status so that it could create more on-the-spot agility in its production processes to meet market demand. “Its latency was one day before and that lag was a challenge for them in manufacturing,” he says. “With Data Virtualization, when you query data at the top of a consuming system it goes in real time to the source system.”
The Internet of Things is equally a catalyst for Data Virtualization for some customers and potential clients. A heavy machines equipment manufacturing company uses sensors to constantly monitor the condition of its systems for performance and service needs, which it collects in a NoSQL database with Hadoop, he says as an example. That’s a huge volume of data and it has to be married to data stored in traditional systems that classify machines, their IDs, type, location, and so on. It then has to respond to queries about what system is in line for service soon or immediately report that something has broken down so that a replacement part can be automatically ordered.
An auto manufacturer with whom Shankar has had conversations sees a use case for Data Virtualization for the connected car:
“A vast amount of information about a car’s status will be coming into a large Hadoop data repository and will need to combine with information in existing systems that store other data about the car to provide information to the owner about needed services,” or other issues, he says. “Data Virtualization is able to do that.”
Issues to be Addressed
The technology is the easy part when it comes to selling the idea of Data Virtualization, Shankar notes. The Radiant Advisors report said that among the obstacles some companies face in getting Data Virtualization accepted is user resistance, citing issues such as power users at one organization who resisted the idea of providing a layer of abstraction between data stores and users of those stores.
Shankar says he’s seen similar culture and political issues arise, as well. “People can be very protective of their systems and data,” he says. Data warehouses, CRM systems, and other data sources may all be managed by different teams, and they may be averse to the idea of giving others a view into all their data, for example. “We have seen gatekeepers try to stop people from entering into their systems.” But ideally, that will change once it becomes clear to them that Data Virtualization is non-intrusive, unlike physical data integration, so that data owners will be less reluctant to hold back their data from being shared.
If business users don’t see a need for quick, holistic access of different types of data quickly to perform their day-to-day job, though, they’re not going to buy into Data Virtualization. No matter how good any solution is, “if the business user doesn’t see its value any technology is useless,” Shankar says. However, in certain cases, the value can accrue purely for technical reasons. For example, IT teams have used Data Virtualization in application or systems modernization projects, where they provide an uninterrupted access to underlying data while replacing legacy databases with modern Big Data repositories. “It is like renovating your house while living in it.”
Shankar believes that, at least in the U.S., Data Virtualization is starting to take off, though it still needs to gain more traction in other parts of the world. What will help it continue to gain steam in the states and get into gear elsewhere, he thinks, is sponsorship of the concept – and that’s where the emerging role of Chief Data Officer comes in. (See recent DATAVERSITY® stories on the CDO role here and here.) That’s the prime constituent Denodo is now targeting, he says. That role “transcends business and IT and the person in it can dictate to gatekeepers that they can’t hold data back,” he says.