Hive
fix(kura): drop bootstrap whole-dir hard check that reintroduced the budget stall
GitHub issue · Closed
What
Removes the ensure_tmp_dir_capacity whole-dir hard check from the bootstrap path (re-added in #11375’s review-findings pass), restoring the held-reservation-only behaviour.
Why (production incident)
kura@0.10.5 was hotfixed onto the prod kura-tuist-* mesh to recover the stuck -2 replicas, but bootstrap still failed with the original error:
bootstrap … failed: tmp dir budget exhausted: 8146276336 bytes staged, 784550880 bytes requested, 8589934592 bytes allowed
That string is ensure_tmp_dir_capacity’s. The held reservation (the real fix) bounds concurrent staging by waiting when the budget is full; the re-added whole-dir check fails the artifact when the dir is full. So once bootstrap legitimately fills the 8 GiB budget with concurrent staging, the next artifact is rejected and the whole bootstrap fails — exactly the stall the reservation exists to prevent.
The finding-1 regression test missed this because it used a budget of 2× the artifact size, so only two artifacts stage at once and the dir never overshoots. Production has a large budget relative to artifact size (8 GiB vs ~784 MB), so many stage concurrently, fill the budget, and trip the check.
Fix
- Remove the
ensure_tmp_dir_capacitycall + its now-unused import. - Remove the finding-1 test (
bootstrap_accounts_for_non_bootstrap_tmp_usage) that asserted the hard check. - Keep the held reservation and finding-2’s streaming cap. The reservation alone bounds peak staging to the budget and waits rather than failing, so an account larger than the budget bootstraps in waves — covered by
concurrent_peer_bootstraps_converge_and_bound_peak_tmpandbootstrap_succeeds_when_total_artifacts_exceed_tmp_budget. - A code comment now documents why a whole-dir hard check must not live here. The finding-1 concern (non-bootstrap tmp occupants) is negligible: the node is out of the Service while bootstrapping.
Validation
Full kura suite 257 passing, fmt + clippy clean. Destined for kura@0.10.6 and a prod kura hotfix.
🤖 Generated with Claude Code
No GitHub comments yet.