perf(infra): reduce Grafana metrics cardinality

Metadata

Source

tuist/tuist #11383

Updated

Jun 24, 2026

Domains

Compute

Details

Resolves N/A

Reduces Grafana Cloud active series and DPM from the managed Kubernetes monitoring chart by tightening metric allow/drop rules for the highest-cardinality sources.

This adjusts the k8s-monitoring wrapper to drop unused high-cardinality metrics from cAdvisor, kube-state-metrics, node-exporter, annotation autodiscovery, and Alloy self-scrapes. It also routes CNPG instance metrics through a relabel step so cnpg_pg_settings_setting keeps only name="max_connections", which is the only setting consumed by the CNPG dashboard.

The goal is to preserve the metrics used by the Kubernetes app and Tuist dashboards while removing duplicated or unused series such as extra container_memory_* gauges, pod status reasons, node CPU governor labels, Cilium internals, Loki write latency buckets, and unused Phoenix response-size buckets.

How to test locally

helm lint infra/helm/k8s-monitoring -f infra/helm/k8s-monitoring/values-staging.yaml
helm template k8s-monitoring infra/helm/k8s-monitoring -n observability -f infra/helm/k8s-monitoring/values-staging.yaml >/tmp/k8s-monitoring-rendered.yaml
Rendered relabel rules were inspected in /tmp/k8s-monitoring-rendered.yaml for the expected metric drops and CNPG max_connections keep rule.
kubectl apply --dry-run=client was attempted, but it requires API discovery and failed because no local Kubernetes API was reachable at localhost:8080.

Comments

No GitHub comments yet.