Hive Hive
Sign in

perf(infra): reduce Grafana metrics cardinality

GitHub issue · Closed

Metadata
Source
tuist/tuist #11383
Updated
Jun 24, 2026
Domains
Compute
Details

Resolves N/A

Reduces Grafana Cloud active series and DPM from the managed Kubernetes monitoring chart by tightening metric allow/drop rules for the highest-cardinality sources.

This adjusts the k8s-monitoring wrapper to drop unused high-cardinality metrics from cAdvisor, kube-state-metrics, node-exporter, annotation autodiscovery, and Alloy self-scrapes. It also routes CNPG instance metrics through a relabel step so cnpg_pg_settings_setting keeps only name="max_connections", which is the only setting consumed by the CNPG dashboard.

The goal is to preserve the metrics used by the Kubernetes app and Tuist dashboards while removing duplicated or unused series such as extra container_memory_* gauges, pod status reasons, node CPU governor labels, Cilium internals, Loki write latency buckets, and unused Phoenix response-size buckets.

How to test locally

  • helm lint infra/helm/k8s-monitoring -f infra/helm/k8s-monitoring/values-staging.yaml
  • helm template k8s-monitoring infra/helm/k8s-monitoring -n observability -f infra/helm/k8s-monitoring/values-staging.yaml >/tmp/k8s-monitoring-rendered.yaml
  • Rendered relabel rules were inspected in /tmp/k8s-monitoring-rendered.yaml for the expected metric drops and CNPG max_connections keep rule.
  • kubectl apply --dry-run=client was attempted, but it requires API discovery and failed because no local Kubernetes API was reachable at localhost:8080.
Comments

No GitHub comments yet.