Running a Database-as-a-Service in Kubernetes

essidsolutions

As companies tackle digital transformation, Kubernetes has emerged as a key technology driver. Learn how enterprises can successfully use the platform to run databases from Jiten Vaidya, CEO and Co-Founder of PlanetScale. 

As companies create and implement strategies to tackle digital transformation, Kubernetes has emerged as a key technology driver. By abstracting away cloud-specific requirements and allowing engineering teams to focus on feature development and uniform deployment, Kubernetes is quickly becoming the de facto operating system for compute resources, either in the data centers or in the public cloud. A whole ecosystem of tools has emerged for Kubernetes, enabling the movement from data centers to the cloud.

The efficiencies brought about by these tools in terms of uniformity and ease of deployment have not been leveraged for databases because of the perception that stateful workloads such as databases are hard to run in Kubernetes. It is worth learning the reality because running databases in Kubernetes has even more benefits than just running application servers. 

These benefits include: 

  • Efficiency: Turning databases into generic units of work that Kubernetes can pack onto existing machines like other applications, allowing for a uniform deployment platform for all applications.  
  • Performance: Improving latency by running databases adjacent to microservices.
  • Reliability: The operator pattern handles software upgrades, hardware failures, and network partitions without application downtime. 
  • Security: The data stays within the organization’s network perimeter, ensuring the security policies are met. 
  • Ease of Operations: It is easy to deploy and manage databases using database-as-a-service.

Many companies are already familiar that running in Kubernetes can provide these benefits for their stateless applications. In this article, we’ll think through what it takes to bring those same benefits to stateful workloads — in particular, databases.

Learn More: Will the Pandemic Shift Data From On-Prem To Public Cloud?

How To Run a Database-as-a-Service in Kubernetes?

When thinking about a database-as-a-service, it’s important to consider two main components. The first is the database functionality, which provides the ability to save and retrieve data with durability and availability guarantees, and offers support for frameworks that developers want to use. 

The second consideration is support for administrative functions such as deploying the databases, monitoring the databases, applying schema changes, backing up and restoring databases, and exporting the data to analytics systems as needed. 

As we start thinking about such a service deployed in a Kubernetes cluster, we need to understand how Kubernetes as a platform is different from more traditional platforms for running a database. Traditionally, databases are run on customized machines with a high number of CPUs, a high amount of RAM, and large, fast disks (SSDs or NVMes these days). The assumption is that these hosts will continue to run (uptime) for many months without the need to reboot, thus providing guarantees that a database server running on these machines will continue to run without interruption over long periods of time. 

In contrast, on an orchestration framework like Kubernetes, any pod’s longevity cannot be taken for granted. The orchestration system might decide to deschedule a pod for any reason at any time. Thus, it becomes critical that the database system guarantees its availability in the face of the master pod going away at short notice. 

Kubernetes provides persistent volumes as a mechanism that offers data persistence beyond the lifetime of a pod. But running a database in a single pod (single copy of data) is as dangerous as running a single database instance outside Kubernetes. Typically, it is advisable to run multiple replicas (multiple copies of data).

This means that an orchestration layer is needed to keep track of the masters and replicas for the database and elect one of the replicas to be the new master, should the master pod go away. This layer should also set up the replication topology correctly so that the rest of the replicas are now replicating from the new master. 

Now, the application needs to know the IP address and port number for the new master database to continue to read and write without interruption. This can be solved by a combination of a stateless proxy that the application connects to and a metadata service to which the orchestration layer publishes the IP address and port number of the new master. These features combined enable service discovery. 

For all of these pieces to work together, the system needs one more attribute —  observability. For an external observer of the system, it should be easy to answer the questions such as what is the write QPS, what is the read QPS, how frequently are my masters failing over, how long did my last schema change take to run, etc. Building good observability in all system components and making it consumable by excellent open-source tools such as Prometheus and Grafana completes the picture.

Learn More: Intelligent Automation Could Be the Missing Key To Your Digital Transformation Success

Using the Operator Pattern To Run Databases Safely in Kubernetes

State of the art for implementing the orchestration layer is the operator pattern invented by CoreOS, now a part of Red Hat. An operator implements the scaffolding that stitches together a collection of services that work together and allows the users to define these through higher-level declarative configuration. 

The operator does the job of translating higher-level specifications into the primitives that Kubernetes understands, such as pods, services, secrets, etc. and deploys, monitors, and manages this collection of services. Thus, the operator pattern is ideally suited for implementing the components of a database-as-a-service that we discussed above. 

Running a database master and replicas in a set of pods without orchestration layers would be worse than running it outside of Kubernetes. With an operator, a proxy, and the added observability, you have much more than just a database —  you have a platform that can function as a database as a service. 

Kubernetes has been established as the platform for running stateless applications. We are convinced that it can also serve as a platform for running databases. The well-established operator pattern for Kubernetes provides the functionality required to safely run databases in Kubernetes with the additional benefits of a unified infrastructure. Since the databases can run adjacent to the application within Kubernetes, they also benefit from improved performance, higher reliability, and greatly reduced management overhead.

Did you enjoy reading this article on Kubernetes? Let us know your thoughts on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We would love to hear from you!