Advertisement

Data Replication: The Crux of Data Management

By on

Data Replication x300by Jelani Harper

There are numerous daily developments in the data landscape involving new technologies and applications impacting the Internet of Things, Big Data, the consumerization of IT, and the burgeoning empowerment of the business. Despite so much going on, the vast majority of interest regarding data and its influence throughout contemporary society is easily stratified into two categories that are far from mutually exclusive.

According to Continuent CEO Robert Hodges: “If you look at where’s the investment going on in databases right now, there’s really two big places: analyzing data, which is one of the main reasons why people are moving data, and the other thing of course is moving data into the cloud.”

Granted, many organizations move data to the Cloud to perform analytics. However, others need to consolidate and move data to perform analytics within their physical infrastructure.

And, despite the fact that virtualization technologies provide a means of aggregating data without actually moving them from their physical location, unpleasant realities such as failure and maintenance all but mandate that data replication take place to safely backup files.

The question is how to do so quickly, consistently, and in the most cost-effective manner so that common functions regarding moving data by copying it into different databases or the Cloud—such as performing real-time analytics, consolidating both traditional strategic and newfound tactical data, or aggregating transactional data—are optimized and easily repeatable.

The two most popular models for doing so are the synchronous multi-master paradigm and the asynchronous master/slave paradigm. According to Hodges, however, one is considerably more effective than the other:

“Master/slave asynchronous replication is the most scalable way to maintain large quantities of data. It’s a proven model that’s been out there since the first data replication products appeared in the early 1990s. I think it’s important for people not to forget it when a lot of new technologies come along and people make big claims about them. But this is the model that really works.”

Multi-Master Synchronous Replication vs. Master/Slave Asynchronous Replication

Most data replication utilizes either one of these two models; data replication is essential for recovery from failures, performing maintenance, and moving data. The multi-master synchronous replication provides automatic updates of data in different locations (whether in different geographic localities or within different locations and forms of infrastructure within the enterprise) and utilizes a decentralized system in which there is no single master database.

In comparison, the master/slave method has a master database and replicates to the other units without synchronous updating. Data is first written in the master database and then committed to its storage capabilities to replicate to slave databases, which, in certain instances, can be done in real time. These two models vary considerably in regards to the following key aspects of moving data:

  • Celerity and Performance: Although the multi-master paradigm might appear faster due to its automatic updates, in reality it is considerably slower than its counterpart for most uses of data copying. All databases have to communicate with one another prior to updates taking place; although doing so requires fractions of seconds, there can be considerable latency with transactions requiring repeated changes and those involving great distances (such as between data centers). In these instances the master/slave model (which is used in products such as Continuent Tungsten and Continuent’s Tungsten Replicator) can outperform the former by a factor of 10.
  • Transaction Variability: The multi-master model is also slowed by transaction variability, specifically in circumstances in which the size of transactions includes both smaller and larger amounts of data. The larger transactions can actually impede the progress of the smaller transactions by blocking the communication between databases, which again slows the entire data duplication process and all of the users of that database.
  • Availability: The aforementioned latency issues with the multi-master method effectively prevent the database from functioning until the communication between databases has taken place. With the master/slave model, a change to an application is committed by the application immediately—without any waiting—which enables the user to continue using the database and the application while the replication is taking place, regardless of distance or transaction size.

“Synchronous replication requires the application to always wait until you’re sure that the change has got to a different location,” Hodges said. “Asynchronous replication gives you low latency on the application.”

Maintenance and Failures

Perhaps the most demonstrable aspect of the difference in availability between these two models is that which is evinced in the event of failures and the performance of maintenance. Although complete failures are less common than the need to maintain and update various software and infrastructure concerns relating to databases, there is a world of difference in the way that multi-master and master/slave models handle these periods of downtime.

When the master database is unable to replicate data with the latter model in the event of failure or during times of maintenance, it simply keeps attempting to copying data. Once its connection to its other databases is restored, it is able to update that data automatically and to continue to be in use. Models that utilize synchronous replication, however, may need to send over a full copy of the entire set of data—not just that which needs to be updated—in order to makeup for their downtime. With concerns for failure eminent and the realities of upgrading and maintaining databases very necessary, this final difference between these two models may be the most convincing in the superiority of the asynchronous master/slave paradigm:

“Accounting systems, marketing campaign automation, automobile information, security data and many others…there’s a very wide range of applications that use this model,” Hodges acknowledged. “As I said, it’s one that’s really proven and it’s so common that actually people don’t think about it as being anything special, but it is special because it works so well for so many customers.”

Synchronous Replication and More

Although Continuent’s replication products largely utilize the asynchronous master/slave method of data replication, the company is currently considering new offerings that incorporate synchronous replication. Within certain fields, synchronous replication can add value to the Data Management landscape—such as in the medical field, in which governmental regulations make it absolutely necessary for organizations to be accountable for their data at all times. The lag in performance for synchronous replication can be minimized (especially across data centers) by using local copies of data for replication purposes.

Lessons Learned

Retrospectively, it is important to realize that data must move at some point. Typically they do so for Cloud access and for aggregating analytics. Despite the tangible benefits that virtualization plays in the aforementioned processes, data replication is an integral aspect of recovering from failures and accounting for downtimes due to common maintenance procedures.

Of the two most widely used methods for copying data, the asynchronous master/slave model is consistently faster, applicable to varying sizes of jobs, and offers greater availability in the event of failure or maintenance procedures. In many instances products utilizing the synchronous multi-master method can do the same functions as the master/slave counterpart, but much less efficiently and with considerable issues. With the mission critical applications that can hinge upon data aggregation for analytics and Cloud access, it is pivotal to employ the most effective means of data replication.

Leave a Reply