Hive Hive
Sign in

fix(server): match suite-granularity shard timing by module-qualified name

GitHub issue · Closed

Metadata
Source
tuist/tuist #11469
Updated
Jun 24, 2026
Domains
Compute
Details

What changed

Tuist.Shards.fetch_timing_data/2 for suite granularity now joins test_suite_runs to test_module_runs and keys the historical timing map on concat(module.name, '/', suite.name) instead of the bare suite name.

Why

Suite-granularity shard plans collapsed to a single shard regardless of the configured per-shard duration target, so a job that should fan out across many shards ran as one long shard.

Root cause

The two sides of the timing lookup built keys differently:

  • The CLI sends each suite as Module/Suite (blueprintName/className), e.g. AppTests/LoginTests (ShardPlanService.swift).
  • The server grouped test_suite_runs on the bare name column, producing a map keyed by LoginTests — no module prefix.

So every Map.get(timing_data, "AppTests/LoginTests", default) in assign_durations/3 missed and fell back to the median duration. With every suite assigned the same tiny median, the bin packer’s total estimate was ~100x too low and BinPacker.determine_shard_count/3 computed ceil(total / target) = 1.

Module granularity was never affected because module names are both stored and sent in the same bare form, so those lookups match (which is exactly why module plans fan out correctly while every suite plan collapsed to 1).

Why this fix

Keying the timing query on the module-qualified name makes the server’s map match what the CLI sends. A join (rather than stripping the module prefix off the CLI input) is also more correct: the previous bare-name grouping conflated same-named suites that live in different modules.

Validation

  • Added a regression test (matches suite timing data by module-qualified name) that seeds two suites in the same module with very different durations and asserts each is assigned its real per-suite timing. Verified it fails against the old code (returns the median, 45500, instead of 90000) and passes with the fix.
  • Full test/tuist/shards_test.exs suite green (24/24). mix credo and mix format clean on the changed file.
  • Empirically confirmed against production data for an affected project: the broken lookup estimated a 783-suite plan at ~101s total (→ 1 shard), while the module-qualified join matches all 783 suites against real CI timing and estimates ~2.8h total (→ ~17 shards needed at a 10-minute target).

Note for affected users

The needed shard count is still clamped to --shard-max (default 10) in BinPacker.determine_shard_count/3. To hit a tight per-shard target on a large suite set, --shard-max must be raised accordingly; otherwise shards stay larger than the target.

Comments

No GitHub comments yet.