Hive Hive
Sign in

perf(server): cut FINAL scans on ClickHouse test_case_runs and test_runs

GitHub issue · Closed

Metadata
Source
tuist/tuist #10881
Updated
Jun 24, 2026
Details

Summary

Three call sites that account for the bulk of CPU and I/O on production ClickHouse were rewritten to avoid full-part FINAL scans. Each query plan was validated against the local ClickHouse via EXPLAIN indexes = 1.

fetch_full_test_case_runs/1 (server/lib/tuist/tests.ex)

Drops FINAL on test_case_runs. Re-inserted rows (flaky flag updates) preserve the (project_id, test_case_id, ran_at, id) ORDER BY tuple, so ORDER BY inserted_at DESC + Enum.uniq_by(& &1.id) returns the same logical row without the per-key in-memory merge across all matching parts. The slim MV already narrows the result to a page-sized set of ids, keeping the Elixir dedup small. Mirrors the pattern already used in mark_test_case_runs_as_flaky.

expire_stale_in_progress_test_runs/0 (server/lib/tuist/tests.ex)

test_runs was converted to ReplacingMergeTree by migration 20260114100000, so FINAL is not a no-op. But the local EXPLAIN shows FINAL triggers PrimaryKeyExpand from 1/15 granules back to 15/15, defeating the idx_status set(3) skip index. Split into:

  1. Candidate-id scan without FINAL — idx_status prunes to a single granule.
  2. FINAL resolve step over the small id set — uses the proj_by_id projection ordered by id, and re-checks status = 'in_progress' and inserted_at < cutoff after the merge so runs that have already completed are filtered out.

test_runs_metrics/2 (server/lib/tuist/tests/analytics.ex)

Switched the count from test_case_runs (ORDER BY (project_id, test_case_id, ran_at, id)test_run_id is only a bloom_filter) to the test_case_runs_by_test_run MV (ORDER BY (test_run_id, ran_at, id)). EXPLAIN confirms a PrimaryKey binary search on test_run_id (1/1 granule) versus the previous 24-part scan. count(DISTINCT id) handles the MV’s ReplacingMergeTree duplicates without paying for FINAL.

Local EXPLAIN deltas

Pattern Before After
expire_stale_in_progress_test_runs candidate scan 15/15 granules (PrimaryKeyExpand after FINAL) 1/15 granules (idx_status)
expire_stale_in_progress_test_runs resolve step n/a FINAL constrained to candidate ids via proj_by_id
test_runs_metrics 3/24 parts via project_id PK + bloom_filter 1/1 granule via test_run_id PK

groupArrayLastMerge (the flakiness monitor) is intentionally untouched — it is being investigated separately.

How to test locally

  1. mise run clickhouse:start
  2. mix test test/tuist/tests_test.exs:8533 test/tuist/tests/analytics_test.exs:206 test/tuist/tests_test.exs:660 test/tuist/tests_test.exs:873 — all 11 targeted tests pass.
  3. Optionally re-run the EXPLAIN snippets above against your local tuist_development database to confirm the index-selection deltas.

🤖 Generated with Claude Code

Comments

No GitHub comments yet.