Hive Hive
Sign in

fix(kura, infra): restore logs and geography on the Tuist Kura dashboard

GitHub issue · Closed

Metadata
Source
tuist/tuist #10873
Updated
Jun 24, 2026
Domains
Kura
Details

Kura’s Grafana dashboard showed traces but “No data” for Kura Logs and Tenant Geography. Root cause was two independent breaks in the Prometheus pipeline; traces were unaffected because Kura pushes OTLP directly and the traces panel filters on service.namespace="kura" with no cluster filter.

This PR fixes both:

  • fix(kura): Kura pods had no prometheus.io/* annotations, so the managed k8s-monitoring annotationAutodiscovery never scraped :4000/metrics. kura_node_info, kura_node_geo_info, and kura_http_requests_total never reached Grafana Cloud, starving every Prometheus panel and template variable. Added scrape/port-name/path pod annotations in the base chart values (port-name: http pins discovery to the single http port; the pod also declares grpc and peer). Inert when no scraper is present, so self-hosters’ embedded Prometheus benefits too.
  • fix(infra): The dashboard hardcoded cluster="kura-tuist", a value no cluster emits (overlays use tuist-{staging,canary,production}), so every Prometheus query and the tenant/instance/region/route/status variables matched nothing, which also broke the Kura Logs panel (it joins on the instance variable). Added a Cluster template variable (label_values(kura_node_info, cluster), defaults to All) and replaced all 49 hardcoded cluster="kura-tuist" with cluster=~"${cluster:regex}" across panels and variables. Traces panel left unchanged.

Follow-up to verify after deploy: the Kura Logs panel filters app_kubernetes_io_instance=~"${instance:regex}". ${instance} resolves from the Prometheus instance label on kura_node_info, while the Loki side label is the Helm release name. If those values don’t align once metrics flow, the logs panel may need a small follow-up (align the variable or switch the Loki filter to pod). The two structural blockers fixed here are independent of that.

How to test locally

  • python3 -c "import json; json.load(open('infra/grafana-dashboards/tuist-kura.json'))" confirms the dashboard JSON is valid and no kura-tuist literals remain.
  • helm lint kura/ops/helm/kura -f kura/ops/helm/kura/values.yaml passes; helm template renders the three prometheus.io/* annotations on the StatefulSet pod template.
  • Full verification is post-deploy: after the Kura clusters redeploy, the Tenant Geography and Prometheus panels populate and the template variables resolve.
Comments

No GitHub comments yet.