Google announced the general availability of Kubernetes control plane metrics in Google Kubernetes Engine (GKE). These metrics are integrated directly into Google Cloud Monitoring, providing a one-stop solution for troubleshooting issues with GKE. Integration with third-party observability tools is also possible through the Cloud Monitoring API.
Although GKE fully manages the Kubernetes control plane, the newly exposed metrics can be useful for troubleshooting. For example, understanding API server status can be aided by a combination of metrics. This includes the use
apiserver_request_duration_seconds to track the load experienced by the API server, the number of requests returning errors, and request response latency.
The newly available measures can also help solve planning problems. The following metrics can all be used to help determine why pods aren’t transitioning from waiting to scheduling:
scheduler_pending_pods scheduler_schedule_attempts_total scheduler_preemption_attempts_total scheduler_preemption_victims scheduler_scheduling_attempt_duration_seconds
An increase in the number of pending pods may indicate a scheduling issue that may be caused by an underlying resource issue.
The new metrics are all displayed in the Kubernetes Engine part of the Cloud Console. This is available on the Observability tab under Control Plane.
With this integration, it is possible to create alert policies in Cloud Altering on these newly available metrics. Continuing with the planning issues described above, an alert could be created on both
scheduler_pending_pods. The first metric on the rise could indicate that higher priority pods are preventing other pods from being scheduled. However, increasing both metrics could mean that there are not enough resources available for pods.
When enabled, metrics are collected using the Google Cloud Managed Service for Prometheus. Metrics will be sent to Cloud Monitoring in the same GCP project as the Kubernetes cluster. These metrics can then be queried using PromQL through the Cloud Monitoring API and Metrics Explorer. Additionally, any third-party observability tool could ingest the metrics using the Cloud Monitoring API.
GKE clusters running on control plane version 1.23.6 or later can access Kubernetes API Server, Scheduler, and Controller Manager metrics. Note that these metrics are not available for GKE Autopilot clusters. The following command can be used to enable metrics collection from the API server, scheduler, and controller manager:
gcloud container clusters update [CLUSTER_ID] --zone=[ZONE] --project=[PROJECT_ID] --monitoring=SYSTEM,API_SERVER,SCHEDULER,CONTROLLER_MANAGER
Metrics can also be configured through Terraform using the monitoring_config block.
Kubernetes Control Plane metrics are billed at the standard price for metrics ingested by Google Cloud Managed Service for Prometheus. For more details on the release, please see the blog post.