Avoiding the Next Cloud Outage: Keeping SQL Server Up and Running Through Disaster

essidsolutions

Cloud SLAs may guarantee 99.99% availability of at least one VM in your service infrastructure, but that guarantee does not extend to your applications and data. This article by David Bermingham, Technical Evangelist, SIOS Technology, explores ways to keep SQL Server running during an unanticipated cloud service disruption.

Today’s cloud-based infrastructure-as-a-service (IaaS) offerings can relieve your IT team of many of the burdens long associated with day-to-day management of a traditional IT infrastructure. Hardware maintenance? Check. Performance and capacity? You can add or remove resources at the click of a button. Reliability? Those high availability SLAs can make you feel like you’re in the best of hands.

Is there anything left to worry about? Of course there is! Azure, AWS, and Google Cloud have all had their share of unanticipated service disruptions–several of them involving whole regions. While certain IaaS SLAs may guarantee that at least one VM in your infrastructure will be available 99.99% of the time, they do not guarantee that your applications and data will be available. That means that your IT professionals still need to plan for outages, even if they no longer have to maintain the infrastructure in the data centers themselves.

What are your options for ensuring the availability of your SQL Server when disaster strikes the cloud itself? A variety of Windows-native and third-party approaches can reduce the risk of SQL Server service disruption, even if the cloud is affected. Different solutions have different strengths and weaknesses, however, so it behooves your IT team to understand the tradeoffs to determine which approach will best support your organization’s availability needs.

What to Expect When You Least Expect It

AWS, Azure, and Google Cloud all offer configuration options designed to protect your SQL Server infrastructure from catastrophic failure. These options guarantee the availability of at least one VM at least 99.99% of the time. Moreover, they provide ways for you to ensure a VM operating after an unexpected disruption can access your SQL Server data and continue to provide services for your organization. Each of these approaches relies on services native to Windows Server, and, depending on your availability requirements, one of these approaches may provide you with the uninterrupted availability your organization needs.

Ensuring Continuity Using Storage Spaces Direct

Azure takes advantage of a Windows-based feature called Storage Spaces Direct, first introduced in Windows Server 2016 Enterprise Edition. This approach strives to ensure SQL Server continuity by creating a pool of virtual storage that can be shared across two or more servers. By configuring the VMs to be part of an Azure Availability Set, you can ensure the VMs always run in different racks (known as Fault Domains in Azure parlance) in the same Azure data center. This Storage Space can then be used as shared storage in a SQL Server Failover Cluster Instance (FCI) configured to include the VMs in the Availability Set. This ensures that any failures on the active VM will be detected automatically and SQL Server will then fail over automatically to another node in the FCI and continue to access the data in the shared storage.

Learn More: The Three Hidden Issues Preventing Your Machine Learning DeploymentsOpens a new window

The Storage Spaces Direct approach has one obvious shortcoming that you need to weigh in the balance. Storage Spaces Direct does not support clusters whose nodes reside in different data centers. This constrains you to the 99.95% SLA associated with Availability Sets (rather than the 99.99% SLA associated with a data center-spanning Availability Zone). In the event of a disruption significant enough to bring down an entire data center, then all the servers your infrastructure will go offline.

Finally, a less obvious but potentially more significant shortcoming of Storage Spaces Direct is that it requires SQL Server 2016 or later. If you’re trying to ensure the availability of an earlier edition of SQL Server, Storage Spaces Direct won’t work for you.

Ensuring Continuity Using Always On Availability Groups

If the possibility of an entire data center going down poses an unacceptable level of risk, SQL Server offers an option called an Always On Availability Group, which replicates user defined databases across multiple SQL Server instances which can be located in multiple datacenters. While a catastrophic storm or regional grid failure may take an entire data center offline, the same event is unlikely to affect the operations of a data center hundreds or thousands of miles away.

Learn More: How AI-Powered Automation Will Elevate ITOpens a new window

Yet there are tradeoffs to the Always On Availability Group approach. There’s no shared virtual SAN in an Always On Availability Group. To ensure availability of data, this approach relies on a mechanism for replicating SQL Server database files from the primary VM to the secondary VMs. If a secondary VM is called into service, it will interact with the data that has been replicated to local storage from the primary SQL Server database.

Two less obvious, but potentially more problematic, tradeoffs, are these: First, Always On Availability Groups replicate only the user-defined databases in your SQL Server infrastructure. This approach doesn’t replicate key system databases, meaning things such as logins, passwords, or agent jobs are not automatically replicated to other nodes in the cluster. You’ll want to take this into consideration so as not to be caught unaware. Second, Always On Availability Groups can be very expensive. The full-featured implementation of Always On Availability Groups requires the Enterprise Edition of SQL Server, which may be otherwise unnecessary if your application requires only the features of SQL Server standard edition.

Weighing the Trade-Offs

While certain third-party products and Platform-as-a-Service options take different approaches to ensuring the continuity of SQL Server in the event of an unexpected outage, the Storage Spaces Direct and Always On Availability Group approaches comprise the key approaches available through today’s cloud service providers. You’ll need to weigh your continuity needs to determine whether either of these, or a different approach altogether, will provide the continuity your organization needs.

A lookback of only a few years reminds us that the cloud does not hover so high above the ground that an earthquake or a hurricane can’t deliver a debilitating blow. And just because you don’t need to worry about day-to-day infrastructure events if you’ve offloaded your infrastructure to the cloud, if you want to ensure service continuity you do need to develop plans for dealing with unexpected outages.

Let us know if you liked this article on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We would love to hear from you!