The “Next Level” of Data Warehousing: Real Time Integration

By   /  March 18, 2014  /  No Comments

iStock_000005502090XSmallby Jelani Harper

The explosion of Big Data has revolutionized the landscape of Data Management – whether organizations have a Big Data initiative or not. In addition to augmenting traditional proprietary sources of data with innumerable forms of structured and unstructured data, Big Data technologies (most notably Hadoop) can provide a degree of real-time integration that has transformed the concept of Data Warehousing into a much less expensive, more comprehensive platform for data aggregation.

Thanks to the replication and clustering prowess of vendors such as Continuent and its Continuent Tungsten and Tungsten Replicator 3.0 product, enterprises can reap a number of benefits from utilizing Hadoop as a mega Data Warehouse such as:

  • Integration: Tungsten is able to replicate data at extremely high levels, enabling organizations to aggregate disparate data sources for more effective analytics. With Hadoop’s scalability, organizations can input virtually all of their data there for.
  • Real-time analytics: the replication capabilities of Tungsten Replicator can copy data into Hadoop (or into other databases) nearly instantly to take advantage of improvements in Hadoop for real-time analytics with technologies such as Cloudera’s Impala, HBase, and others.
  • Expedient installation: A number of aspects of Tungsten are automated, which enables organizations to expedite installations in a quasi-plug-and-play style which helps in adding different data sources.
  • Reduced system loads: Tungsten replicating and clustering capabilities take up minimal loads on the systems they are reading from, which frees the system loads for use with other applications.

Although Tungsten works with other databases (such as MySQL, Oracle, and numerous NoSQL options), the 3.0 version (released February 7) is the first that supports Hadoop and thereby alludes to the future of Data Warehousing. According to Continuent CEO Robert Hodges:

“Anybody who’s using Hadoop needs to be looking at solutions like ours which can actually pull the data and have it positioned and ready to load into Hadoop literally in seconds. I think this whole real time issue is going to be really important for businesses over the next couple of years.”

Instant Integration

Perhaps the most viable aspect of Continuent Tungsten (which provides clustering in the form of a Database-as-a-Service model (DBaaS) and includes Tungsten Replicator, which is also available separately and is open source) is its capacity to rapidly integrate data sources and types. Such integration needn’t always involve Hadoop, although doing so provides the ideal means for running a comprehensive set of analytics on diverse data (Big or otherwise) such as sentiment and transaction data. Tungsten was designed to explicitly extract data from both Oracle and MySQL. Database options for loading include Hadoop, MongoDB, Vertica, and others which may involve the use of code.

As a DBaaS, Tungsten is able to cluster any number of databases (via Tungsten Replicator) and place a connectivity layer atop them so that applications (such as analytics) interface with the different databases as just one. The ability to cluster databases supports the growth of the enterprise as well as increases performance for essential functions like querying.

The foundation of Tungsten’s clustering is its Replicator, which reads database logs and discerns (within seconds of input) updates within the source database’s log pertaining to changes to or additions of data. The Replicator then forwards this information to the Replicator in the destination database in seconds. The low latency all but eradicates any possibility of inaccuracies or redundancies, and enables nearly instantaneous updates for data. Moreover, the process is extremely straightforward and requires no code or alterations to the databases other than simply turning on the Replicators – which ideal for laymen and potential business users.

Practical Results

Although applications of Tungsten vary according to the specific business objectives of a particular organization, the ability to quickly copy data across databases transcends industries. Continuent’s marketing automation customer Marketo faced a fairly common situation when it attempted to upgrade its database in MySQL to Oracle due to the latter’s well-known reliability and scalability.

Instead of adjusting each and every application previously set to interact with MySQL to now interact with Oracle (an option which is both time consuming as well as costly), Marketo utilized Tungsten’s replication capabilities to simply copy relevant transaction data between the databases. Therefore, the organization was able to continue to use MySQL as the primary database to which all of its software was attached, as well as leverage Oracle’s reliability and scalability to actually increase the number of transactions it was conducting. Hodges reflected on the tremendous difference that Continuent made:

“When we first started Marketo was probably doing less than a hundred million transactions a day; they’re now up to 700 million and still growing. We saved them clearly millions of dollars in licensing costs from Oracle. More importantly, it saved them from a very painful migration to take software that was running well on MySQL and put it on Oracle. That would have been a major change for them and one they didn’t want to handle.”

Another customer, Zappos, also benefited from moving data from MySQL to Oracle. In addition to being able to do so in close to real time without having to alter the specific applications it was running on the two databases, the organization also takes advantage of the low system load that the replication process takes by processing many different requests at the same time.

Open Source

Although Continuent Tungsten requires licensing fees and offers additional support that customers must pay for, one of the most valuable aspects of Tungsten Replicator is that it is open sourced and has no licensing fees. The open source aspect of this application provides a number of tangible benefits to the enterprise, including:

  • Easy accessibility: Users can simply download Tunsgten Replicator, configure it, and begin replicating data per their needs. There is no lengthy decision-making process which requires funds allocation or integration concerns – customers can readily see for themselves if the technology can benefit their specific processes.
  • Great Company: A number of the most popular databases today are open source (particularly MySQL, Hadoop, and various NoSQL offerings). As such the development team at Continuent has issued Replicator with similar cost models (subscription-based with flexible scalability) to many of these options. Additionally, Continuent is also certified to work with many of the aforementioned database types, which helps to facilitate easier and tighter integration.
  • Agility: The agile boons of Tungsten Replicator are two-fold, and are applicable both to end users as well as to Continuent. Whereas end users can modify the software at their leisure to tailor it for their own personal uses, the open source community provides an important avenue of feedback for Continuent developers who can readily see which aspects of the product are working and why, which is valuable feedback for future developments.

The Cloud and More

The final major advantage to utilizing replication and clustering applies to Cloud Computing, which is steadily gaining ground in the wake of Big Data and real-time analytics. Loading transactional data to and from the Cloud is both expedient and reliable via the clustering and replication technologies Continuent provides, which operate as a virtual bridge between the bricks and mortar realm and the cloud itself.

Tungsten’s viability for Cloud computing and real-time analytics, in addition to its penchant for significantly enhancing the open source community demonstrate that its value extends to both Big Data initiatives and beyond. But, as a retrospective of the significance of the recent 3.0 release indicates, its most utilitarian purpose may be in restructuring the Data Warehouse concept by providing nearly instantaneous replication and clustering of data inside Hadoop. In this regard, Hadoop’s usage can actually transcend Big Data and simply become a routine part of enterprise warehousing.

You might also like...

Property Graphs: The Swiss Army Knife of Data Modeling

Read More →