Hive
perf(infra): reduce Grafana metrics cardinality
GitHub issue · Closed
Resolves N/A
Reduces Grafana Cloud active series and DPM from the managed Kubernetes monitoring chart by tightening metric allow/drop rules for the highest-cardinality sources.
This adjusts the k8s-monitoring wrapper to drop unused high-cardinality metrics from cAdvisor, kube-state-metrics, node-exporter, annotation autodiscovery, and Alloy self-scrapes. It also routes CNPG instance metrics through a relabel step so cnpg_pg_settings_setting keeps only name="max_connections", which is the only setting consumed by the CNPG dashboard.
The goal is to preserve the metrics used by the Kubernetes app and Tuist dashboards while removing duplicated or unused series such as extra container_memory_* gauges, pod status reasons, node CPU governor labels, Cilium internals, Loki write latency buckets, and unused Phoenix response-size buckets.
How to test locally
helm lint infra/helm/k8s-monitoring -f infra/helm/k8s-monitoring/values-staging.yamlhelm template k8s-monitoring infra/helm/k8s-monitoring -n observability -f infra/helm/k8s-monitoring/values-staging.yaml >/tmp/k8s-monitoring-rendered.yaml- Rendered relabel rules were inspected in
/tmp/k8s-monitoring-rendered.yamlfor the expected metric drops and CNPGmax_connectionskeep rule. kubectl apply --dry-run=clientwas attempted, but it requires API discovery and failed because no local Kubernetes API was reachable atlocalhost:8080.
No GitHub comments yet.