Hive Hive
Sign in

fix(server): chunk recent bucket backfill on memory pressure

GitHub issue · Closed

Metadata
Source
tuist/tuist #11237
Updated
Jun 24, 2026
Domains
Atlas
Details

What changed

  • Keeps the normal monthly/project-chunk backfill path for test_case_runs_recent_N_per_case.
  • Falls back to day-sized chunks when ClickHouse reports MEMORY_LIMIT_EXCEEDED.
  • Falls back again to per-project day chunks if a day-sized project batch still exceeds memory.

Why

The production deploy after the migration-job scheduling fix is still spending a long time in Helm. The previous failed production deployment diagnostics show that the migration hook did run, but failed in 20260515100000_create_test_case_runs_recent_buckets_per_case_mvs.exs while backfilling test_case_runs_recent_100_per_case.

ClickHouse exhausted the query memory limit on the May 2026 partition:

Code: 241. DB::Exception: Query memory limit exceeded ... maximum: 5.59 GiB. (MEMORY_LIMIT_EXCEEDED)

The existing retry loop retried the same oversized query, so the Job consumed all backoff attempts and Helm eventually timed out.

Why this approach

Raising memory alone is risky: the same log also showed total ClickHouse memory pressure. The migration already writes mergeable AggregateFunction(groupArraySorted(...)) states, so splitting a failed chunk by day and then by project keeps the resulting aggregate semantically mergeable while reducing peak memory for the problematic chunks.

The fast path is unchanged for chunks that fit, so the migration does not become a day-by-day backfill for every partition.

Impact

Future deploy attempts should be able to complete the historical backfill without repeatedly failing the same oversized ClickHouse query. Failed previous attempts are safe to rerun because this migration drops and recreates the bucket tables at the start of up/0.

Validation

  • Parsed the migration file with Elixir Code.string_to_quoted!/1.
  • Ran basic mix format with a temporary formatter that avoids dependency/plugin imports because this worktree has no server deps installed.
  • Ran git diff --check.

Full project mix format was not run because server/deps is absent in this worktree and the project formatter imports ecto_sql, phoenix, and formatter plugins.

Comments
TA
tuist-atlas[bot] Jun 12, 2026

This fix is now available in server@1.210.3. Update to this version to get the chunked backfill behavior that handles memory pressure gracefully.