Configuring SQL Server for High Availability in Azure

essidsolutions

If you’re moving your SQL Server infrastructure to Azure, you’ll learn that you’ve got an abundance of options to ensure high availability and disaster recovery. A closer look will reveal not all options will guarantee the same level of availability.

Suppose you’re thinking about moving your critical SQL Server infrastructure to Azure. In that case, you’ll soon learn that you’ve got an abundance of options when it comes to ensuring high availability (HA) and disaster recovery (DR). An HA configuration is essential if your SQL Server system needs to be available at all times, for an HA configuration will ensure that you or your customers can access that database 99.99% of the time. Expressed in terms of minutes, HA guarantees that you’ll experience unexpected downtime of less than 5 minutes per month.

A DR configuration, in contrast, isn’t designed for immediate and automatic failover if your primary infrastructure goes dark. Instead, a DR solution creates infrastructure and a mirrored database in a remote location that you can use if a catastrophe compromises all the infrastructure in your primary region (as the back-to-back hurricanes did in Puerto Rico a few years ago). 

As noted, Azure appears to offer quite a list of options. You might think about using Azure Site Recovery or SQL Server’s Availability Groups (AG) feature. Alternately, you might consider configuring a multi-node failover cluster instance (FCI) that relies on Premium File Share or Storage Spaces Direct for cloud storage—or even a SANless clustering solution that doesn’t rely on shared storage at all. 

But are all these really options? If you want true HA and a viable option for DR, a closer look reveals that only some of these options will guarantee the support you seek. 

The Azure Site Recovery Option

Consider Azure Site Recovery. Azure Site Recovery replicates your virtual machines (VMs) and data to an Azure Availability Zone (AZ) in a separate region (an AZ is essentially a data center). If your primary SQL Server infrastructure goes dark—for whatever reason—you’d be able to pivot and use the replicated infrastructure in a remote AZ to run your critical operations. This promises availability, but is it high availability? The SLA for Azure Site Recovery stipulates a recovery time objective—the maximum amount of time it will take to bring your services back online in the event of an unplanned outage—of two hours. But an HA configuration guaranteeing 99.99% availability would promise to have your SQL Server operations back up and running in seconds, with no more than 4.38 minutes of interrupted access per month.

The Availability Groups Option

Azure Site Recovery is a better solution for DR than HA. A better Microsoft option for HA relies on SQL Server’s built-in Availability Groups (AG) feature. This feature automatically replicates user-defined SQL databases from your primary SQL Server infrastructure to live backup infrastructure in a separate AZ within the same region. The SLA for AG guarantees that at least one of your SQL Server instances will be available 99.99% of the time, which makes it a true HA solution. 

However, the caveat about AG is that its robustness depends on which edition of SQL Server you are using. SQL Server Standard Edition includes Basic Availability Group functionality. That will replicate one SQL Server database to one replication site. The far more costly SQL Server Enterprise Edition includes Always-On Availability Groups, replicating more than one SQL Server database. It will replicate to more than one replication site. Indeed, if you added a third backup site in a remote AZ, you could use AG for both SQL Server HA and DR. It should be noted, though, that AGs replicate only user-named SQL databases. AG does not replicate the system databases that hold job information, passwords, and similar. This feature also applies to any non-SQL Server files that might reside in storage. To ensure HA or DR for that data will require additional tools. 

The Failover Cluster Options

What about a traditional multi-node FCI using Premium File Share or Storage Spaces Direct? A failover clustering approach traditionally configures SQL Server on multiple VMs, interacting with the same storage infrastructure (a SAN, for example). If the active instance of SQL Server goes offline for some reason, the FCI fails over to a backup node in the cluster where another instance of SQL Server interacts with the database files residing on the shared storage. 

While you can configure SQL Server to run on such an FCI using Storage Spaces Direct, you can’t configure Storage Spaces Direct to store in multiple Azure Availability Zones. That limitation effectively moots an FCI using Storage Spaces Direct as a true HA solution. It’s simply impossible to ensure 99.99% availability unless you have infrastructure running in more than one data center. It doesn’t matter how much redundancy may be inside the data center if the entire data center goes offline—and whole data centers do go offline unexpectedly.

Azure’s Premium File Share options offer ways to overcome the availability limits of Storage Spaces Direct, but creating an FCI with a Premium File Share poses its own challenges. Azure offers a variety of Zone-Redundant shared storage options, which are designed to remain accessible to the VMs in an FCI even if the VMs and storage in the primary AZ go offline. While that sets up the conditions for true HA, the approach may involve strictures that are unsuitable for any organization. Certain popular storage-associated technologies cannot be used on Zone-Redundant Premium File shares (including read-only host caching, disk bursting, or Azure Disk Encryption). You can use Azure disk backup, but no other SQL Server backup solution—nor can you use Azure Site Recovery to create a DR instance of your deployment. 

So, yes, in theory, you can use Zone-Redundant Premium File Shares to create a HA Azure infrastructure for SQL Server, but only if you can forego the host caching, encryption, and other features. You’ll also need to find another method of replicating your infrastructure to a remote site if you want to configure for DR.

Finally, there’s the option to configure SQL Server for HA on Azure by creating an FCI using third-party SANless clustering software. This involves building out an FCI that spans at least two AZs (or even a third AZ in a remote region if you want to add a DR component on top of HA). Instead of using storage shared among the cluster nodes, though, you would attach storage to each VM. The SANless clustering software replicates all the data in the storage on the active VM to the storage systems attached to the secondary VMs. If the primary VM goes offline for any reason, the FCI immediately fails over to a secondary node, where an instance of SQL Server, with all the replicated data, is ready to take over.

With synchronous replication and high-speed failover, an FCI built with SANless clustering technology can provide the HA you’re seeking in an Azure configuration. A number of emergency responders that share operational information among different jurisdictions’ computer-aided dispatch systems have configured their systems in Azure in this way to ensure that their critical systems are always accessible. The caveats? This approach involves the use of a third-party tool, so you’ll have to license SANless clustering software for each node in your FCI.

But, licensing the SANless clustering software is much less expensive than licensing SQL Server Enterprise Edition if you would only be deploying SQL Server Enterprise to gain access to the Always On AG functionality. An added advantage over AG is that the SANless Clustering software is application agnostic. It’s just copying blocks of data—any data—from one storage system to another. So, it will ensure the HA of all your SQL databases, even those that AG does not replicate, as well as any other file or application that may be in storage on your FCI nodes.

Which option would you choose when moving your SQL server infrastructure to Azure? Share with us on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We’d love to hear from you!

MORE ON DISASTER RECOVERY: