Click to learn more about author Dave Bermingham.
When you think about how best to ensure the high availability (HA) of your applications and data, you need to think about more than just the speed with which you can failover to a standby system if your database suddenly fails. Yes, you should be prepared for full failover if a catastrophe strikes your data center, but full failover is a big hammer to bring down if the “failure” of your database is really something as slight as a stalled process.
In evaluating solutions to ensure the most appropriate approach to HA, there are four critical features you should look for beyond robust failover management:
- Application level monitoring
- Avoidance of unnecessary failover
- Application awareness for complex mission-critical solutions
- Automation and ease of management
Let’s look at each of them in turn.
Application Level Monitoring
Look closely at a cloud provider’s service-level agreement (SLA) and you may find that the SLA only ensures that at least one of the virtual machines (VMs) in your HA configuration will be available 99.99% of the time. But that’s no guarantee that you’ll be able to access or interact with your data 99.99% of the time – so you need an HA solution that takes the availability of your applications and data into consideration. There are innumerable reasons why you may lose access to your data that have nothing to do with whether the underlying VM is operating. There could be operating system or application software bugs, stalled processes, storage or memory failures, and more – and none of these matters are covered by a cloud service provider’s SLA. You need to look for an HA solution that can monitor errors, events, and anomalies throughout your application infrastructure.
Avoidance of Unnecessary Failover
Full failover is appropriate when there’s a catastrophic failure of your production application infrastructure – when you can’t access a critical database because the entire data center has been flooded, for example. But such break-glass moments are rare. Far more common are the operating system and application errors described above. But unless your HA solution has the ability to determine that a stalled process is the real culprit keeping you from accessing your data, then your HA solution may simply failover to secondary infrastructure when that strong a response is unwarranted.
HA monitoring features that can detect and respond proactively to a wide range of application, operating system, network, and other noncatastrophic events can help ensure ongoing access to data and help you avoid unnecessary failover. If your HA solution is intelligent enough to know what to watch for within your application environment – and know what to do when it encounters issues – then you will regain access to your critical applications and data even faster than you would in a fast failover scenario. It may take only seconds to failover between infrastructures, but it may take only fractions of a second to restart a process or queue.
Application Awareness for Complex Mission-Critical Solutions
Complex applications from SAP and Oracle involve many interlinked components running on different systems, and restarting those applications – or even individual parts of those applications – requires that certain systems and processes be started in a certain order. Failure to restart these applications in the proper order can result in a significant delay getting back online. The good news is that certain HA solutions come with features specifically designed to streamline HA in these kinds of complex environments. Not only do they provide a specific kind of application awareness to help ensure that SAP, Azure, and other mission-critical systems continue to perform at their peak, but they also ensure that, should portions need to be restarted, the restart occurs automatically, smoothly, and in the proper sequence to ensure minimal interruption.
Automation and Ease of Management
As one might imagine, configuring and managing an HA solution capable of proving this kind of monitoring and response support could become extremely complicated. Look for an HA solution that is mature enough to rely on proven, wizard-driven GUIs that can perform the critical low-level tasks for you. Look also for a solution whose monitoring and notification tools will integrate cleanly with your existing application and performance management toolkit. This will minimize the number of screens and systems your team must track and enable your team to make the most of the solutions it is already using.
An HA solution designed to provide these key features enables an organization to rely on a range of situationally appropriate tools with which to ensure the high availability of applications and data. Failover is an important tool, but it’s not the ideal tool for every situation.