Data Virtualization through the Looking Glass

by Paul Williams

As the environments used to manage information become more complex, the data itself begins to lose its overall business value. It gets increasingly difficult to wean valuable knowledge out of a tangled mass of databases, web sites, document repositories, and so on. A layer of abstraction providing a common view of an enterprise’s disparate data resources is the promise of Data Virtualization.

A major component of SOA, data integration, Business Intelligence systems, and the new As a Service offerings trend, Data Virtualization allows executives to do what they do best – manage their business. Having a C-level employee spend an inordinate amount of time dealing with the intricacies of overly complex enterprise information architecture does not help the bottom line.

Data Virtualization means many things. It can abstract the technical details of a database instance, allowing the user to focus on what is important. It is able to combine data from different sources into one result set. It allows for a common dashboard view of corporate information to be shared by all employees.

This article looks at products from a few of the leading Data Virtualization solution vendors, providing some insight to their features and capabilities.

Denodo for Data Virtualization and Data Integration

Denodo, located inPalo Alto,California, offers products that serve both data integration and virtualization needs. Founded in 1999, the company is privately held, while backed by a $2.8B venture capital firm.

The Denodo Data Services Platform is a middleware solution containing functionality for Data Virtualization, data federation, as well as Cloud data integration. It offers a rich set of tools for data transformation, allowing data view “mashups” to be shared across the enterprise. The platform’s architecture includes three separate layers: Connect, Combine, and Publish.

The Connect Layer uses a collection of data wrappers to communicate with a variety of data sources, including JDBC and ODBC compliant databases, MDX data warehouses, and web services supporting the REST and SOAP protocols. Their wrappers also handle Cloud-based data as well as the semi-structured information found on websites. Denodo offers a SDK useful for creating data wrappers for other unsupported information sources.

The Combine Layer includes data transformation and federation functionality. It leverages the abstraction provided by the Connect Layer to create a singular view of an enterprise’s data. The performance, security, and query management functionality of the Denodo platform reside within the Combine Layer.

The Publish Layer provides functionality to share an enterprise’s “single point of interface,” be it in the form of an OpenAjax widget on a web page, or in a web service consumable by other corporate systems. This layer features a variety of flexible options for the presentation and reuse of data.

These three layers combine to make the Denodo Data Services Platform worthy of consideration for any enterprise’s Data Virtualization needs.

Informatica Integrates Data

Informatica is known for their wide range of solutions in a variety of areas within Data Management. A publicly held company, Informatica was formed in 1993. Their headquarters is located in Redwood City,California.

Out of the company’s wide array of product offerings, Informatica Data Services (IDS) is the one solution aimed at Data Virtualization. IDS features a single, combined platform for data integration and data federation. It also includes on-the-fly data transformation capabilities with integrated data quality and data masking functionality.

Getting the business and IT sides of the shop to work together seamlessly is an important goal of most enterprises. IDS facilitates this through the use of role-based tools that foster business and IT collaboration using shared metadata. This includes Informatica Developer, an Eclipse-based IDE allowing developers to create and deploy web services.

Data profiling is handled using a service-based model allowing for an “any stage” integration of this functionality. The data and logic are documented with metadata, and the profiling results are shareable between business stakeholders as well as developers.

IDS also offers a single, unified environment for centralized Data Governance and security policy management. In addition to data security, it is also possible to control privacy, quality, and freshness of data from this same platform.

Considering Informatica’s full range of Data Management products, their wide experience gives them a relevant background to take into account when looking at Informatica Data Services as a Data Virtualization platform.

SAS Brings Virtualization with DataFlux

The SAS Institute is commonly known for their line of statistical and analytical software. But, through a robust number of acquisitions, SAS Institute has earned expertise in many other areas of Data Management. Their 2000 acquisition of DataFlux is especially relevant to this article.

DataFlux’s platform suitable for Data Virtualization includes DataFlux Management Studio, and its companion, DataFlux Management Server. The Studio product has a common interface shareable by business and technical employees. Both the development and delivery of data integration, data quality, and MDM solutions are managed within the application.

This shared approach facilitates the internal data application design process, in addition to delivery and change management. Business processes and rules are easily implemented and managed. Consistency in visual elements and the overall enterprise-wide user interface becomes second nature.

Virtualization tools include the ability to merge internal and external data from a variety of sources while leveraging a variety of timing factors, including batch, real-time, and virtual. It is possible to both monitor data quality rules and manage corporate metadata from one interface.

DataFlux Data Management Server works in concert with the Studio product adding the capabilities needed in distributed environments. This includes parallel processing, an array of ETL transformations, and the overall scalability required when dealing with high transaction volume.

DataFlux Data Management Studio and Server need to be considered when looking for a Data Virtualization solution.

IBM and InfoSphere Federation Server

IBM needs no introduction. Big Blue’s massive line of Information Technology products also contains one suitable for Data Virtualization applications: IBM InfoSphere Federation Server (IFS).

IFS is sold in a variety of packages, from an all-encompassing Enterprise Edition to versions specific for Data Warehousing, SAP, data integration, and data quality applications.  The InfoSphere Business Information Exchange package leverages the creation of a common business language to enhance an enterprise’s efforts at Data Governance and Metadata Management.

Federation Server features connectivity to a wide variety of data sources using ODBC. This includes older DB2, VSAM, and ISM instances all the way to relational databases, XML, as well as unstructured and structured formats. Federated queries are able to scale to manage higher volumes of data using IFS’s parallel processing architecture.

IFS provides visual tools used for federated data modeling and discovery. Federated two-phase commits ensure data quality across distributed data systems. Legacy development work gets leveraged through the ability to run stored procedures on heterogeneous data sources.

Version 9.1 of InfoSphere Federated Server includes enhanced policy and rules support for information governance. Web-based blueprints are now shareable as metadata to facilitate collaboration on enterprise data policy. Enterprise Metadata Management also now supports metadata import and improved integration.

InfoSphere Federated Server appears to be a robust tool for Data Visualization with a large collection of supported databases and data formats.  It also features a collection of tools to manage metadata and enterprise data quality rules.

Oracle’s Data Integration Products

Oracle is best known for its relational database product, but through acquisition, has now extended its reach into many areas of Information Technology. The world of Data Virtualization is no exception, as Oracle Data Integration Suite includes a full set of Data Management components for data integration.

Data Integration Suite performs integrations in bulk with batch processing, in real time, or at an object level. It leverages a portable Java-based environment highly suitable for SOA environments.

In addition to its migration functionality, suitable for integration applications, DIS includes services for Data Governance, data quality, and data profiling. The governance functionality supports the ability to perform cleansing, lineage tracking, as well as auditing.

Related products in the Data Integration Suite family include the Oracle Data Integrator ETL tool, the Oracle Hyperion Data Relationship manager, and Oracle BPEL Process Manager used for the development of business processes.

Oracle Golden Gate is another product option suitable for the integration of data from disparate sources allowing a singular view to facilitate C-Level decision making. Both product lines, part of Oracle’s Fusion Middleware series, make the company a relevant player in Data Virtualization technology.

Data Virtualization, including closely related technologies like Data Federation and Data Integration, remains a valid option for companies looking to lessen the impact of complex data architectures. It allows executives and employees to concentrate on business decisions focused on meaningful information, unencumbered by the technical details related to the data itself.

Related Posts Plugin for WordPress, Blogger...

  4 comments for “Data Virtualization through the Looking Glass

  1. December 8, 2012 at 2:59 am

    Thanks for another informative site. The place else may just I am getting that kind of info written in such a perfect manner?
    I’ve a challenge that I am simply now working on, and I’ve been at the look out for such info.

  2. Andrew Sturt
    December 17, 2012 at 11:25 am

    A couple of questions. First off, you mentioned Denodo’s three layer architecture: Does Informatica promote any such model? Secondly, you don’t mention Composite Information Server, which is commonly identified as one of the top DV tools along with Denodo and Informatica: Why the omission?

Leave a Reply

Your email address will not be published. Required fields are marked *