The Promise of Cloud Native: Does it Rely on Deploying Cloud Solutions Into Kubernetes?

essidsolutions

Rob Whiteley, GM, NGINX, talks about as companies move to cloud native, DevOps and application teams can pick solutions and deploy them into Kubernetes, which in theory makes applications more resilient and scalable. But if rushed, it can be equivalent to running with scissors and can be dangerous.

Digital transformation has given way to cloud native transformation in the land of tired cliches. It’s not enough to go digital anymore. Now companies must leapfrog to cloud native. By this, they mean building distributed systems using microservices and automated DevOps practices. It almost always means these components run in containers orchestrated by Kubernetes.” All applications must be decomposed into groups of services, each of which is managed as a standalone application, fronted by an API, with its own business logic, policies and security rules.

In theory, this is amazing. It makes applications more resilient and scalable and allows developers to innovate and iterate at a faster (or slower) pace, as suits them. Cloud native enables everyone to get along more easily. Networking teams, security teams, SREs, and API owners can all control their destiny and develop at their own pace, with their own solutions and tools. Part of the promise of cloud native is allowing application teams and service owners to pick their own solutions and deploy them into Kubernetes. New infrastructure components like a service mesh ensure delivery, resilience, security and observability. 

In practice, it leads to running with scissors. Out of the box, Kubernetes has a few guardrails and containers that make it easy for developers to deploy anything they want, regardless of consequences. There are now over 1,000 open source projects in the cloud native universe, all competing for mind space in the Kubernetes orbit. In reality, not all cloud native software solutions and infrastructure components enjoy the same levels of support in Kubernetes.

Likewise, not all cloud native software solutions offer feature parity and compatibility required for true seamless interoperability, easy observability, and robust security. This can lead to unmanageable, dangerous tool sprawl. Developers adopting a cornucopia of solutions can make cloud native deployments insanely complex, hard to manage and insecure both for the service owners and for the platform operations teams. 

When things go wrong with Kubernetes, it can be hard to troubleshoot. In theory, Kubernetes fails over gracefully. In practice, Kubernetes failures in production may take a half hour or longer to rollback and restart. In other words, developers and platform teams rushing into a cloud native architecture may be running with scissors – and they can get hurt.  

Learn More: How Cloud Gaming Can Be the First 5G Killer App

The History of Virtualization Provides a Map to Sanity

Fortunately, we are starting to see a path to resolution to this problem. History can be our guide. The trajectory of Kubernetes, containers and service mesh right now are mirroring the adoption curve and trajectory of virtualization software twenty years ago. This is also the same evolution IT saw in the shift from mainframe to client-server and then from client-server to desktops connected via networks.

We can learn from these cycles. A good model for delivering on the promise of cloud native is to structure your cloud native strategy to mirror the evolution of orchestration tools and operations team best practices for the previous change cycle. So let’s take a walk down memory lane and the early days of virtualization. 

Developers first fired up virtual machines on laptops as development environments so they could program and test code without having to reserve expensive time on servers or deploy their own boxes. Later, corporations adopted VMs for server consolidation. Over time, this led to a host of new operating models for computers as well as a host of new products. CTOs and IT leaders realized that could treat VMs as fungible resources that could be moved across geographies or sharded and mirrored easily to deliver high availability, consistency, or any other infrastructure characteristic. 

Networking, security, storage management, policy enforcement and access control all had to be refactored to work in virtual environments. All of these products produced epic tool sprawl and chaos. Networking teams, security teams, DBAs and IT teams all struggled to keep up with developers choosing their own tools and solutions to suit their specific environment and application needs. 

Something had to give. And then came ESX (and later, vSphere), XenServer, and other virtualization management tools. The goal of these tools was to collapse tool sprawl and create catalogs that developers could easily pick from in building their environments. This preserved choice but also reduced complexity to a management level. For admins, these virtualization management tools collapsed many previously standalone functions into a single platform. 

Virtualization combined with new software-defined infrastructure tools like Chef and Puppet allowed the VM to provide not only compute but also networking, security, monitoring and data handling all in the same machine image. Virtualization managers gradually assumed all these tasks, simplifying life for developers but also for corporations. The rote parts of networking and security were automated, leaving more complex tasks for humans on those teams. 

So after a decade, we arrived at a better place where developers could get what they want more easily and quickly but without creating giant operational headaches and risks. The logical end state was the arrival of the public clouds, which commoditized virtualization management and capabilities to such a degree that it became something organizations simply stopped worrying about and outsourced to their cloud provider. 

Learn More: 6 Best Practices To Implement Cloud Strategy and Data Protection in the Hybrid Work World

Dockers, Containers, and Cloud Native

We are entering the early-middle phases of that same technology cycle with containers and Kubernetes. VMs simplified the deployment of servers but didn’t abstract away all the complexity of running servers as part of applications or in networks. Containers remove the remaining complexity of managing and configuring VMs. Traffic coming into and out of containers is not traversing a big OS stack. This makes sense. Developers don’t want to think about low-level MAC addresses, ethernet addresses or NIC cards. They would rather think in terms of APIs, URIs and programmable infrastructure at Layer 7.

Containers alone are not sufficient to get there. You also need an orchestration plane like Kubernetes that commoditizes all networking up through Layer 4. Every developer running containers on Kubernetes can assume they are backed up by a big, flat “pipe” at the lower layers of the networking stack. Fifteen years ago in the era of VMs, architects were engineering around unreliable network connections. Today containers and Kubernetes enjoy direct connections to fiber backhaul in compute clouds. With Kubernetes, the proxy has replaced the switch as the intelligent control point.  This allows them to be less concerned about moving packets and more focused on describing the desired behavior of their applications. 

Just like the early days of virtualization, Docker containers were deployed on small cloud instances and laptops by devs looking to write and test code without overhead and to avoid the heavy costs and infrastructure required to run standard VMs in the earlier generations of the public cloud. The abstraction of containers was also attractive. With traditional VMs, developers had to be aware of key elements of the operating system and networking stack in coding applications. Docker removed that need. 

Then containers began to show up in the data center and in production applications in numbers that dwarfed VMs. Spinning up a container was cheap, easy and fast. Everyone started doing it, resulting in some epic fails as container-based applications performed in unexpected ways and teams struggled to deliver resiliency and reliability as developers moved faster and faster.  

This brought next-level complexity in terms of management and orchestration requirements and the need for Kubernetes. But Kubernetes alone was not enough to make it safe to run with scissors in cloud native land. As the core networking and orchestration layer, it enabled necessary coordination to scale containers but Kubernetes did not supply what’s needed for enterprise capabilities such as security, resiliency, high-availability, compliance with rules and policies. 

Kubernetes alone could not stop applications teams from “going rogue” often inadvertently. Because Kubernetes pushed so much control of Layer 7 to service owners, it actually made it easier for them to hurt themselves and their fellow service owners; configuration changes could take down an entire shared cluster. A retry setting that was too brief or a rate limit set too high could result in one service effectively DDoSing another service with nothing to prevent this catastrophe. 

To take Kubernetes and cloud native containerized applications to true enterprise grade, we have to reduce the risks without reducing application development agility and speed. We want developers to run with scissors, but only if our ops teams can protect them with bubble wrap and maybe dull the blades a bit. Some solutions are heading in this direction. RedHat OpenShift and SUSE Rancher overlay atop Kubernetes to provide some protections, policies, and ease of management. 

Just like what we saw with virtualization management and orchestration solutions, these and other overlays actually consolidate multiple admin roles – security, networking, data management – into one admin role for platform ops or service ops. This shifts responsibilities further left, or closer to the developer. In virtualization, networking, security and operations are consolidated into one role. With Kubernetes, these responsibilities are being pushed all the way out to the service owners and application developers, further flattening responsibilities to increase agility and autonomy.  

Kubernetes allows for this consolidation by breaking down the silos providing the single platform of operations where all these activities can be managed. Google led the way with its foundational work on Site Reliability Engineering and best practices for containers, which extended into Kubernetes. In the past few years, another layer of management, the service mesh, has emerged as an important way to not only ensure containers are running but that the applications running in those containers can be properly meshed, monitored, and secured. 

Using virtualization history as our guide, we know what comes next – more consolidation of capabilities and more comprehensive platforms for managing containers through Kubernetes and service meshes. The plumbing is being built – the translation layers for protocols, the security tooling that can protect not only the perimeter but also the interior and East-West traffic, the solutions that can deliver ACID-grade data handling on clusters and containers that are constantly moving. 

For platform teams and operators, this means Kubernetes and containers will get a lot easier. That said, few can afford to wait for this golden future. If you want to run Kubernetes in production and take the edge off those scissors, yes, test drive management and policy overlays and service meshes. But also take these common-sense steps that mirror the trajectory of virtualization and make a best-efforts safety net for your team. 

  • Acknowledge that tool sprawl induces unsustainable complexity

Yes, developers like to pick all their own tools and programs. Yes, containers make it very easy for them to spin up their own favorite things. But the more tools running, the more complexity is created, both in managing tools and applications and in root-causing failures and persistent problems. Getting everyone to understand that too many tools is the enemy of reliable Kubernetes production is step 1. You can give developers freedom to pick from certain tools if you make it easy for them to comply. Make it hard, and they’ll go rogue.

  • Educate your application teams on their new responsibilities

Developers would rather not handle security, networking and service reliability. But those will be their roles in the cloud native world and so the job of platform teams is to help service owners and applications teams get up to speed with tackling these new roles. It may be helpful to create a curriculum and formalize the process, as well as codify what is expected of them in this new, flatter cloud native reality. But don’t fret too much. Advancements in infrastructure-as-code means these skills can be applied using the universal language or REST APIs and integrated into CI/CD frameworks. Provide the skills and automate their application.

  • Create catalogs and playbooks to allow for choice but reduced complexity

Standardizing on one tool for the job is often not possible because that robs cloud native application development of autonomy and blunts the benefits of this new paradigm. Allowing 20 tools for, say, load balancing or data stores, is overkill. There is a happy medium and your team can help you decide where it is and what’s critical to include in that supported catalog. This may sound like a registry of approved Docker images, but that is only half the battle. Catalogs should also be attached to playbooks for observability and monitoring, for scaling up and down, and for establishing different grades of security. The catalogs set the tools, the playbooks set the rules. 

Learn More: Why Kubernetes Is Vital for Moving Cloud Native Technologies To the Edge

Conclusion: For Faster, Better K8s, Less Is More

Reality check. Cloud native shifts decision-making power to developers and application teams. People will still try to go rogue and run with scissors. And, truthfully, higher speed usually creates greater risks because you have less time to spot and correct errors or problems before catastrophic failure tips over your cluster. That said, in many instances, the best tools are nearly comparable in features, capability and capacity. The guard rails you put in place by limiting sprawl and providing choice but not chaos will actually accelerate development and adoption by making your clusters more reliable and, fittingly, developer-friendly. In the end, developers stopped tuning VMs or worrying about managing NICs and MACs. They left that to their VM admin teams. In cloud native, developers will appreciate being able to focus on building applications on a stable foundation quickly, leaving the container quicksand to their platform ops teams. 

Did you find this article helpful? Tell us what you think on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We’d be thrilled to hear from you.