A Deep Dive Into Kubernetes Monitoring

Application monitoring is one of the most effective ways to discover bottlenecks and predict problems in production. However, it is also one of the biggest challenges. This article by Gilad David Maayan Founder & CEO, Agile SEO discusses the importance of Kubernetes monitoring including metrics and tools.

The Importance of Kubernetes Monitoring

Kubernetes is a popular container orchestration tool. Kubernetes can manage containers in multiple machines. The key advantage of Kubernetes is that it simplifies deployments of distributed processing.

Kubernetes monitoring can help you improve resource utilization, management, and cost control. You have to actively monitor Kubernetes clusters, containers and resource allocation of namespace to ensure that pods utilize all underlying node resources.

Kubernetes Monitoring Metrics

There are a lot of Kubernetes metrics to monitor. Generally, they are separated into three main componentsâ€”cluster monitoring, pod monitoring, and etcd monitoring.

Cluster monitoring

The objective of cluster monitoring is to ensure the health of all Kubernetes clusters. Monitoring enables administrators to maintain proper cluster operation and capacity. In addition, administrators can monitor applications running on each node and the resource utilization of the cluster.

Main cluster metrics:

Node resource utilizationâ€”metrics like network bandwidth, disk utilization, memory, and CPU utilization. Using these metrics, you can discover if you need to change the number and size of cluster nodes.
Available nodesâ€”the number of available nodes enables you to evaluate the performance of the cluster. Additionally, you can find out what the cluster is being used for.
Number of podsâ€”the number of running pods on each node shows how Kubernetes handles each deployment. For example, you can see if the number of available nodes is enough to handle the entire workload in case of a failure.

Pod monitoring

The process of pod monitoring is separated into three general categories, container metrics, application metrics, and Kubernetes metrics.

You can monitor the following information using Kubernetes metrics:

Number of instancesâ€”compares the number of expected pod instances to the actual number. Your cluster may be out of resources if the number is low.
The deployment processâ€”compares the number of changed instances in the new version to the number of instances in the older version.
Network dataâ€”monitors the available network data available and performs health checks.

For container metrics, you can use the cAdvisorOpens a new window tool to analyze container resource usage and performance characteristics. In addition, you need the Metrics ServerOpens a new window to aggregate the resource usage data.

You can monitor the following container metrics:

Container CPU utilizationâ€”counts the total amount of time spent in the kernel and the time spent outside the kernel.
Memory utilizationâ€”cAdvisor memory metrics include page cache memory in bytes, RSS size, container swap usage, current memory usage and more.
Disk Utilizationâ€”cAdvisor tracks disk utilization by analyzing both input and output bytes.
Network Utilizationâ€”you can choose between measuring in bytes or packets for both incoming and outgoing traffic. These metrics show the network utilization by pod name, for each pod.

Etcd Monitoring

Etcd is defined as a reliable key-value data storage solution for distributed systems or clusters of machines. EtcdOpens a new window is a core component of the Kubernetes master node. Kubernetes needs a distributed data store like etcd since the operation is distributed. The purpose of etcd is to store all the K8s data like configuration, state, and metadata.

Etcd uses Prometheus to expose metrics. You can use the metrics for real-time monitoring or debugging. Etcd metrics are not persistent. Every time you restart the system, the metrics are rebooted.

You can monitor the following etcd metrics:

Server metricsâ€”describe the etcd server status. You should closely monitor the metrics of every etcd cluster in production. By doing so, you can detect problems or outages.
Disk metricsâ€”describe the disk operations status. High disk operation latencies usually indicate disk issues. It makes the cluster unstable and causes high request latency.
Network metricsâ€”counts the total number of sent and received bytes.
Debugging metricsâ€”measures the total latency of snapshots. High snapshot duration makes the cluster unstable and indicates disk issues.
Prometheus client library metricsâ€”provides metrics about file descriptor usage. High usage of file descriptors indicates potential exhaustion issues. Exhausted file descriptors cannot create new WAL files.

Learn More: 5 Ways to Streamline Kubernetes AlertingOpens a new window

Methods for Collecting Kubernetes Metrics

The system should handle the metrics collection in the same way and with the same reliability for the entire cluster. Even if the nodes are deployed in different places or in a hybrid cloud.

Using DaemonSets

Usually, it doesn’t matter where your Kubernetes pods are running. However, sometimes you want to run a single pod on all your nodes. DaemonSets is a type of Kubernetes pod that ensures that all nodes run exactly one copy of a pod.

One approach to monitoring all cluster nodes is to create DaemonSets. DaemonSets enable you to monitor each machine in the cluster by deploying an agent on every cluster node. Many monitoring solutions use the DaemonSet structure to monitor Kubernetes. Each tool will have its own software for cluster monitoring. There is no general solution that fits all scenarios.

Using Metrics Server

Metrics Server is a resource usage aggregator inspired by Heapster. The metrics server is a core component of the cluster, implemented together with the Kubernetes monitoring architecture.

The server scrapes metrics from all Kubernetes nodes served by Kubelet through Summary API. Then the server aggregates, stores, and exposes the metrics in Metrics API format. The system remembers only the most recent value of each metric. To access historical data, you can archive the metrics or use a third party monitoring solution.

Learn More: Using Kubernetes Certifications to Empower your EnterpriseOpens a new window

Conclusion

Kubernetes makes your applications run seamlessly. However, that does not eliminate the need to keep an eye on the operation of K8s. The responsible thing to do is to deploy a monitoring tool that keeps you informed and helps you to make data-driven decisions.

Tools like Grafana combine multiple metrics into a useful dashboard. One dashboard for cluster monitoring and the other for pod monitoring. Prometheus is another solution for monitoring Kubernetes metrics. Prometheus provides several available tools, frameworks, and API’s.

There are many other solutions. Be sure to experiment with free tiers and open source tools before introducing a new solution into your environment.