Hive Hive
Sign in

feat(kura): geographic attribution (country + subdivision) on metrics and traces

GitHub issue · Closed

Metadata
Source
tuist/tuist #10799
Updated
Jun 24, 2026
Domains
Kura
Details

Summary

Kura now enriches telemetry with the geographic location of both the request client and the serving node, so we can decide where a customer would benefit from a node in a new region directly off Tempo/Prometheus data.

  • The container image vendors DB-IP Lite City (CC BY 4.0) at /opt/geoip/dbip-city-lite.mmdb, so attribution is on by default with no network access at cold start. It soft-fails (logs a warning, runs without attribution) when the file is absent on custom builds.
  • Client IP (read from X-Forwarded-For / X-Real-IP) is resolved to two granularities:
    • Country (ISO 3166-1): client_country Prometheus label on kura_http_requests_total and geo.country.iso_code OTel span attribute. Falls back to unknown when the header is missing, the IP is private, or the DB is absent.
    • Subdivision (ISO 3166-2, e.g. US-CA): geo.region.iso_code OTel span attribute only.
  • The serving node resolves its own country and subdivision once at startup from a single egress-IP probe and stamps them as geo.country.iso_code / geo.region.iso_code OTel Resource attributes. KURA_NODE_COUNTRY / KURA_NODE_SUBDIVISION override the probe; country additionally falls back to the KURA_REGION prefix.
  • A background tokio task re-downloads the monthly City dump every KURA_GEOIP_REFRESH_INTERVAL_SECS (default 86400, 0 disables) and swaps the in-process RwLock<Reader<Vec<u8>>> under a microsecond write guard, so concurrent lookups never observe a partial state. Outcomes are tracked in kura_geoip_refresh_total{result="ok|http_error|parse_error"}.

Naming

Span and Resource attributes use the OpenTelemetry geo.* semantic conventions (geo.country.iso_code, geo.region.iso_code) so standard geo tooling understands them. The client_country Prometheus label is kept as-is (short, Prometheus-idiomatic; semconv covers spans/logs/resource, not metric label names). The existing kura.region resource attribute (cloud deployment region, e.g. fr-par) is untouched and does not collide thanks to the geo. namespace.

Why subdivision is traces-only

ISO 3166-2 has thousands of codes. Adding it as a Prometheus label, crossed with route × method × status, would inflate the active series on the busiest counter. Country stays on metrics because it is bounded (~250 codes); subdivision lives on sampled traces, which is enough to compute per-request geographic distance.

Resource bounds

The City dump is ~60 MiB compressed / ~125 MiB decompressed today. The download streams and aborts the moment it crosses the compressed cap; refresh is bounded to 128 MiB compressed / 256 MiB decompressed with a 60-second timeout. The vendored reader is plain process memory (not charged against Kura’s internal cache-accounting budget) and sits comfortably within the pod memory limit. Per request the only added cost is one MMDB tree walk under a read lock.

Privacy

Raw client IPs are never persisted: the address is resolved to a coarse country/subdivision in process and immediately discarded, so there is no per-account or per-user record. The emitted attributes and labels are aggregate operational signals with no account scope, so this is intentionally not added to server/data-export.md (that document is scoped to account-specific data exportable on legal request).

Testing

  • mise exec -- cargo fmt
  • mise exec -- cargo clippy --all-targets -- -D warnings (from kura/, clean)
  • mise exec -- cargo test (151 passed, 0 failed)
Comments
P
pepicrft May 16, 2026

Thanks Marek. Consolidating my updates into one comment. On your three points:

  1. Streaming download: fixed in 4a3e8b38a2. It streams the body and bails the moment it crosses the compressed cap, so reqwest never buffers an oversized response.

  2. Subdivision: took your suggestion and ran with it. Switched the vendored DB to DB-IP City Lite, so we now emit ISO 3166-2 (US-CA, CA-ON, AU-NSW, etc.) for both the client and the serving node, falling back to country when subdivision is missing. It’s traces-only and deliberately off the Prometheus labels, since ISO 3166-2 is thousands of codes and would blow up cardinality on the busiest counter; country stays on metrics since it’s bounded (~250 codes).

Two follow-on decisions baked in:

  • Naming follows the OTel geo.* semantic conventions (geo.country.iso_code / geo.region.iso_code) on spans and the node resource, so Grafana/Tempo read them directly. The client_country Prometheus label is kept as-is and kura.region (the cloud deployment region) is untouched; the geo. namespace keeps them from clashing.
  • Resource bounds: measured the City dump at ~60MiB compressed / ~125MiB decompressed, refresh capped at 128/256MiB. For the record, my earlier memory-budget worry was off: the 768MiB is Kura’s internal cache-accounting Helm value, not an OS ceiling, and the real limit is the ~2Gi pod, which we’re comfortably within.

PR title and description are updated to match. Good call starting at subdivision, much more actionable for the big markets you flagged.