Understanding the Options for SQL Server High Availability

essidsolutions

This article helps administrators make more informed choices about the myriad of options for proving high availability and disaster recovery protections for SLQ Server databases for both Windows Server and Linux in private, public and hybrid clouds.

There are numerous choices confronting Database and System Administrators tasked with ensuring high availability for SQL Server. The sources of these choices include SQL Server itself based on its different versions and editions, features included in the underlying Windows Server and Linux operating systems, special provisions available in the cloud, and purpose-built failover clustering software available from third parties.

These options all depend on some form of clustering that consists of a primary (active) and one or more standby instances of the application. But there are significant differences in how the different HA solutions replicate data, detect failures, and handle failover and failback. Because highly available configurations are inevitably more expensive, cost is also a factor, including for licensing, and both the initial implementation and ongoing operational expenditures.

This article provides an overview of clustering options included with and available for SQL Server to help administrators make more informed choices. As will be shown, some options are actually unsuitable for HA purposes, and are included here only to show how they fit into related backup, archiving and disaster recovery (DR) strategies.

Options Included with SQL Server

Always On Failover Cluster Instances has been a standard feature since SQL Server 7 running on Windows NT. FCIs have two desirable traits: inclusion in the Standard Edition of SQL Server; and protection for entire SQL Server instances, including system databases. But FCIs require some form of cluster-aware shared storage, which is not available in the public cloud. Shared storage can and often does exist in in traditional data centers, however, enabling FCIs to be built upon Windows Server Failover Clustering (or WSFC, also a standard feature).

Always On Availability Groups replaced database mirroring (covered below) in SQL Server 2012 Enterprise Edition, and this feature is also included in SQL Server 2017 for Linux. This is SQL Servers more robust HA/DR offering, capable of delivering rapid, automatic failovers with no data loss for HA, and/or protecting against widespread disasters by leveraging asynchronous replication with minimal data loss. But it requires licensing the more expensive Enterprise Edition, making it cost-prohibitive for many applications, and it lacks protection for the entire SQL instance. For Linux, which lacks the equivalent of WSFC, there is a need for additional commercial and/or open source software to create fully functional HA failover clusters. A less capable Basic Availability Groups was added to the Standard Edition of SQL Server 2016, but this feature supports only a single database per AG and does not support readable secondaries.

A notable disadvantage with application-specific options like Availability Groups is that administrators need to use other HA and/or DR solutions for all non-SQL Server applications. Having multiple solutions inevitably increases both initial and ongoing costs, which is why many organizations prefer using separate application-agnostic or general-purpose solutions.

Options Available for SQL Server

Backup procedures are routinely used with all applications, whether critical or not, for archiving purposes, and may be suitable for the DR needs of some. But full and incremental backups fail to meet the Recovery Time and Recovery Point Objectives needed for HA and DR.

Database mirroring is a form of data replication, and as noted above, the much more capable Always On Availability Groups replaced this feature in SQL Server 2012. Because mirroring is now in maintenance mode, it will likely be removed in a future version of SQL Server for Windows, and was not included in SQL Server 2017 for Linuxand never will be. If mirroring is working well for existing applications, there is no need make any changes at this time. But this option is certainly a risky choice for any new HA configurations.

Log shipping provides a means to reconstruct a database by applying the transaction logs from the active instance to a warm standby instance, thereby making it current. But replaying logs can take a considerable amount of time, making this option suitable only for DR and not for HA, which requires a hot ready-to-run standby.

Hypervisors have their own high availability features, and these do provide automatic recovery from failures at the host level. But they do nothing to protect against failures affecting the operating system of the guest virtual machines or the applications running in them. In effect, these features only assure dial tone to VMs, falling far short of what is required for HA.

Public clouds all offer standard and optional ways to maximize the availability of services, and these are usually accompanied by money-back guarantees in service level agreements. But just like the hypervisors, the SLAs only guarantee dial tone at the server level. Additional provisions are, therefore, needed to ensure that SQL Server data is being replicated from the active to the standby instances, and that clients can automatically reconnect after a failover. So while it can be advantageous to leverage a cloud service providers infrastructure, additional steps must be taken to ensure HA for SQL Server.

Storage Spaces Direct (S2D) is a form of software-defined storage that enables direct-attached drives to be pooled for sharing among the multiple servers in a cluster. S2D addresses the lack of storage area networking or other forms of shared storage in public clouds, which has made it difficult to create failover clusters. S2D was introduced in Windows Server 2016 Datacenter Edition, and is supported in SQL Server 2016 and later. While S2D is compatible with SQL Server FCIs, there remain substantial limitations, especially the inability to function across availability zones or regions to protect against failures at the datacenter level.

Purpose-built failover clustering solutions provide application-agnostic HA and DR protection in both Windows Server and Linux environments across public, private and hybrid clouds. They are implemented entirely in software that creates, as the name implies, a cluster of servers (with their attached storage) that affords automatic failover. All such solutions include real-time data replication, continuous monitoring for detecting failures at the application level, and configurable policies for failover and failback. Most also offer a variety of value-added capabilities to simplify implementation and operation, and some even have special provisions for different versions and editions of SQL Server.

Choosing the Optimal Option(s)

The many HA/DR options outlined here can be combined in a variety of ways to create an optimal HA configuration for each application. One popular configuration combines SQL Server Failover Cluster Instances (included in the less expensive Standard Edition) running in the cloud (also to reduce costs) with a third-party failover clustering solution that adds data replication to take the place of the shared storage normally required with a SQL Server FCI. While such a configuration may seem complicated, it is actually quite easy to implement and operate, and has proven to work well for SQL Server and other applications requiring high availability.