Hive
fix(kura, infra): restore logs and geography on the Tuist Kura dashboard
GitHub issue · Closed
Kura’s Grafana dashboard showed traces but “No data” for Kura Logs and Tenant Geography. Root cause was two independent breaks in the Prometheus pipeline; traces were unaffected because Kura pushes OTLP directly and the traces panel filters on
service.namespace="kura"with no cluster filter.
This PR fixes both:
fix(kura): Kura pods had noprometheus.io/*annotations, so the managedk8s-monitoringannotationAutodiscovery never scraped:4000/metrics.kura_node_info,kura_node_geo_info, andkura_http_requests_totalnever reached Grafana Cloud, starving every Prometheus panel and template variable. Addedscrape/port-name/pathpod annotations in the base chart values (port-name: httppins discovery to the single http port; the pod also declares grpc and peer). Inert when no scraper is present, so self-hosters’ embedded Prometheus benefits too.fix(infra): The dashboard hardcodedcluster="kura-tuist", a value no cluster emits (overlays usetuist-{staging,canary,production}), so every Prometheus query and thetenant/instance/region/route/statusvariables matched nothing, which also broke the Kura Logs panel (it joins on theinstancevariable). Added aClustertemplate variable (label_values(kura_node_info, cluster), defaults to All) and replaced all 49 hardcodedcluster="kura-tuist"withcluster=~"${cluster:regex}"across panels and variables. Traces panel left unchanged.
Follow-up to verify after deploy: the Kura Logs panel filters app_kubernetes_io_instance=~"${instance:regex}". ${instance} resolves from the Prometheus instance label on kura_node_info, while the Loki side label is the Helm release name. If those values don’t align once metrics flow, the logs panel may need a small follow-up (align the variable or switch the Loki filter to pod). The two structural blockers fixed here are independent of that.
How to test locally
python3 -c "import json; json.load(open('infra/grafana-dashboards/tuist-kura.json'))"confirms the dashboard JSON is valid and nokura-tuistliterals remain.helm lint kura/ops/helm/kura -f kura/ops/helm/kura/values.yamlpasses;helm templaterenders the threeprometheus.io/*annotations on the StatefulSet pod template.- Full verification is post-deploy: after the Kura clusters redeploy, the Tenant Geography and Prometheus panels populate and the template variables resolve.
No GitHub comments yet.