Hive Hive
Sign in

fix(server): unblock recent bucket production deploy

GitHub issue · Closed

Metadata
Source
tuist/tuist #11241
Updated
Jun 24, 2026
Domains
Kura
Details

What changed

  • Reworked the 20260515100000 ClickHouse migration so the 100/250/500/750 recent-run bucket tables are backfilled from the existing test_case_runs_recent_per_case aggregate instead of rescanning raw test_case_runs partitions.
  • Kept the backfill chunked by project and retained an immediate per-project fallback if ClickHouse still reports memory pressure for a chunk.
  • Mirrored the Kura introspection OAuth keys into server-external-secrets when the managed deployment uses the newer kura-shared-secrets path, preserving compatibility with old server ReplicaSets during failed or partial upgrades.

Why

The previous chunking fix changed the failure mode but did not unblock production: the migration continued scanning historical raw rows for each smaller bucket, so Helm hit its pre-upgrade hook timeout while the job was still backfilling. Because the migration drops and recreates the bucket tables at the start, every retry could restart from zero.

The same failed upgrade also exposed a secret compatibility gap. New server pods read Kura OAuth credentials from kura-shared-secrets, but old ReplicaSet pods still referenced server-external-secrets. The pre-upgrade ExternalSecret hook had stopped writing those legacy keys, so old pods could get stuck in CreateContainerConfigError while Helm was still trying to complete the release.

Impact

The migration now reshapes the already-maintained 1000-run per-case aggregate into smaller sorted bucket states, avoiding the raw historical table scan that was blowing through deploy time. The server ExternalSecret remains backward-compatible long enough for old pods to roll away cleanly.

Validation

  • helm template for production and canary managed values, confirming server-external-secrets now contains KURA_CONTROL_PLANE_CLIENT_ID and KURA_CONTROL_PLANE_CLIENT_SECRET from kura-introspection-oauth-client while new server pods still reference kura-shared-secrets.
  • helm lint for production and canary managed values.
  • clickhouse local smoke test using the exact aggregate-state backfill query against sample data.
  • mix format --check-formatted priv/ingest_repo/migrations/20260515100000_create_test_case_runs_recent_buckets_per_case_mvs.exs after fetching deps; local lockfile refresh was restored before committing.
  • Elixir syntax parse and standard formatter check for the changed migration file.
Comments
TA
tuist-atlas[bot] Jun 12, 2026

The fix for unblocking the recent bucket production deploy is now available in server@1.210.4. Update to this version to use it.