Hive Hive
Sign in

fix(kura): drop bootstrap whole-dir hard check that reintroduced the budget stall

GitHub issue · Closed

Metadata
Source
tuist/tuist #11392
Updated
Jun 24, 2026
Domains
Kura
Details

What

Removes the ensure_tmp_dir_capacity whole-dir hard check from the bootstrap path (re-added in #11375’s review-findings pass), restoring the held-reservation-only behaviour.

Why (production incident)

kura@0.10.5 was hotfixed onto the prod kura-tuist-* mesh to recover the stuck -2 replicas, but bootstrap still failed with the original error:

bootstrap … failed: tmp dir budget exhausted: 8146276336 bytes staged, 784550880 bytes requested, 8589934592 bytes allowed

That string is ensure_tmp_dir_capacity’s. The held reservation (the real fix) bounds concurrent staging by waiting when the budget is full; the re-added whole-dir check fails the artifact when the dir is full. So once bootstrap legitimately fills the 8 GiB budget with concurrent staging, the next artifact is rejected and the whole bootstrap fails — exactly the stall the reservation exists to prevent.

The finding-1 regression test missed this because it used a budget of 2× the artifact size, so only two artifacts stage at once and the dir never overshoots. Production has a large budget relative to artifact size (8 GiB vs ~784 MB), so many stage concurrently, fill the budget, and trip the check.

Fix

  • Remove the ensure_tmp_dir_capacity call + its now-unused import.
  • Remove the finding-1 test (bootstrap_accounts_for_non_bootstrap_tmp_usage) that asserted the hard check.
  • Keep the held reservation and finding-2’s streaming cap. The reservation alone bounds peak staging to the budget and waits rather than failing, so an account larger than the budget bootstraps in waves — covered by concurrent_peer_bootstraps_converge_and_bound_peak_tmp and bootstrap_succeeds_when_total_artifacts_exceed_tmp_budget.
  • A code comment now documents why a whole-dir hard check must not live here. The finding-1 concern (non-bootstrap tmp occupants) is negligible: the node is out of the Service while bootstrapping.

Validation

Full kura suite 257 passing, fmt + clippy clean. Destined for kura@0.10.6 and a prod kura hotfix.

🤖 Generated with Claude Code

Comments

No GitHub comments yet.