Hive Hive
Sign in

fix(infra,kura): kubelet InternalIP address + bound Kura bootstrap tmp staging

GitHub issue · Closed

Metadata
Source
tuist/tuist #11375
Updated
Jun 24, 2026
Domains
Kura
Details

Combines two changes that we want to deploy together so the Kura bootstrap fix can be observed end to end (the InternalIP flip is what makes the stuck EM/macOS-PN kura pods log-readable; the Kura fix is what lets them finish bootstrap). Absorbs #11380.

1. Kura: bound bootstrap tmp-dir staging (the prod fix)

Freshly-started kura pods for a large account never became Ready (GET /ready 503 forever) after the StatefulSets rolled. Bootstrap stages each peer’s artifacts into a shared tmp dir, and for an account whose cache exceeds KURA_TMP_DIR_MAX_BYTES the staging exhausted the budget, failed, retried from scratch, and readiness (gated on bootstrap) never settled:

bootstrap from <peer> failed: tmp dir budget exhausted: 7887454208 bytes staged, 784550880 bytes remaining

Root cause is unbounded concurrent staging, not a leak: up to DEFAULT_BOOTSTRAP_MAX_CONCURRENT_PEERS bootstrap tasks each streamed into the same tmp dir, guarded only by a point-in-time, non-reserving capacity check. Peak tmp scaled with peers × artifact size, not with the budget. This was not the 0.10.4 segment-ring change (that tests correct in isolation); the new pods just happened to roll onto 0.10.4 and bootstrap the now-large cache — a fresh 0.10.3 pod would fail identically, so rollback would not have helped.

Fix: a held byte reservation (TmpBudget + RAII TmpReservation) — each artifact reserves before streaming and releases on drop after persist; reserve() waits instead of failing when full. Peak tmp is now O(in-flight), not O(account size). The waiter registers (Notified::enable) before the capacity check so a release racing the check is not a lost wakeup.

Review hardening (two findings):

  • Other tmp occupants: the reservation only tracked bootstrap stagers, so concurrent uploads/multipart/leftovers could push the shared tmp dir past budget on top of a full bootstrap reservation. Re-added the whole-dir ensure_tmp_dir_capacity guard in the bootstrap path — the reservation bounds bootstrap against itself, this bounds the on-disk total against all occupants. Can’t reintroduce the original failure because the reservation already keeps bootstrap’s own files within budget.
  • Reservation vs staged body: sized the reservation from max(manifest.size, declared Content-Length) and cap the streamed body at the reserved size, so an inconsistent peer serving a body larger than its manifest is rejected instead of overrunning the budget.

Validation: full kura suite 258 passing, fmt + clippy clean. Regression tests: 8 concurrent slow peers stay under budget (fails against pre-fix code with the exact prod error); an oversized chunked body is rejected; a pre-filled tmp dir blocks a bootstrap the reservation alone would admit.

2. infra: kubelet InternalIP address (the observability flip)

Adds InternalIP to kubeletPreferredAddressTypes on all three workload clusters so the apiserver can reach kubelets on nodes with no ExternalIP and an unresolvable Hostname (the Scaleway Elastic Metal kura node + the macOS-PN fleet). Without it, kubectl logs/exec on those nodes fails with no-such-host — which is exactly why the stuck scw-fr-par EM pod’s logs were unreadable during this incident. Flipping the variable re-renders the apiserver patch and rolls each cluster’s control plane (applied by mgmt-cluster-apply on merge).

Deploy ordering

On merge to main: release.yml publishes a new ghcr.io/tuist/kura:<version> (kura-should-release picks up the fix(kura) commits) and mgmt-cluster-apply flips InternalIP. The Kura runtime image must be published before any deploy resolves the new kura@ tag — deploy staging with an explicit kura_runtime_image_tag once the release job’s image push completes.

🤖 Generated with Claude Code

Comments

No GitHub comments yet.