Hive
perf(server): cut FINAL scans on ClickHouse test_case_runs and test_runs
GitHub issue · Closed
Summary
Three call sites that account for the bulk of CPU and I/O on production ClickHouse were rewritten to avoid full-part FINAL scans. Each query plan was validated against the local ClickHouse via EXPLAIN indexes = 1.
fetch_full_test_case_runs/1 (server/lib/tuist/tests.ex)
Drops FINAL on test_case_runs. Re-inserted rows (flaky flag updates) preserve the (project_id, test_case_id, ran_at, id) ORDER BY tuple, so ORDER BY inserted_at DESC + Enum.uniq_by(& &1.id) returns the same logical row without the per-key in-memory merge across all matching parts. The slim MV already narrows the result to a page-sized set of ids, keeping the Elixir dedup small. Mirrors the pattern already used in mark_test_case_runs_as_flaky.
expire_stale_in_progress_test_runs/0 (server/lib/tuist/tests.ex)
test_runs was converted to ReplacingMergeTree by migration 20260114100000, so FINAL is not a no-op. But the local EXPLAIN shows FINAL triggers PrimaryKeyExpand from 1/15 granules back to 15/15, defeating the idx_status set(3) skip index. Split into:
- Candidate-id scan without FINAL —
idx_statusprunes to a single granule. - FINAL resolve step over the small id set — uses the
proj_by_idprojection ordered byid, and re-checksstatus = 'in_progress'andinserted_at < cutoffafter the merge so runs that have already completed are filtered out.
test_runs_metrics/2 (server/lib/tuist/tests/analytics.ex)
Switched the count from test_case_runs (ORDER BY (project_id, test_case_id, ran_at, id) — test_run_id is only a bloom_filter) to the test_case_runs_by_test_run MV (ORDER BY (test_run_id, ran_at, id)). EXPLAIN confirms a PrimaryKey binary search on test_run_id (1/1 granule) versus the previous 24-part scan. count(DISTINCT id) handles the MV’s ReplacingMergeTree duplicates without paying for FINAL.
Local EXPLAIN deltas
| Pattern | Before | After |
|---|---|---|
expire_stale_in_progress_test_runs candidate scan |
15/15 granules (PrimaryKeyExpand after FINAL) | 1/15 granules (idx_status) |
expire_stale_in_progress_test_runs resolve step |
n/a | FINAL constrained to candidate ids via proj_by_id |
test_runs_metrics |
3/24 parts via project_id PK + bloom_filter | 1/1 granule via test_run_id PK |
groupArrayLastMerge (the flakiness monitor) is intentionally untouched — it is being investigated separately.
How to test locally
mise run clickhouse:startmix test test/tuist/tests_test.exs:8533 test/tuist/tests/analytics_test.exs:206 test/tuist/tests_test.exs:660 test/tuist/tests_test.exs:873— all 11 targeted tests pass.- Optionally re-run the EXPLAIN snippets above against your local
tuist_developmentdatabase to confirm the index-selection deltas.
🤖 Generated with Claude Code
No GitHub comments yet.