Hive
fix(infra): unwedge the xcresult-processor image release (tart-kubelet GC + workflow cleanup)
GitHub issue · Closed
Combines the two follow-ups to the xcresult-processor image release wedge into one PR (supersedes #11427).
Background
The xcresult-processor push failed ~10× in a row with Error: The file "nvram.bin" doesn't exist. The cause was not the workflow: the builder-fleet hosts run tart-kubelet alongside the GitHub Actions runner, and tart-kubelet’s orphan-VM GC was reaping the in-flight Packer build VM mid-tart push. The fix flag --disable-vm-gc was added in #11072, but the running builders had a stale, flag-less plist. I patched both live builders out-of-band to unblock the release (verified: a dispatch build then pushed green on attempt 1).
This PR makes the fix durable and cleans up the workflow.
1. Durability — stop binary rolls from stripping --disable-vm-gc
renderLaunchdPlist keys the flag off cfg.GHActionsRunner != nil:
- Bootstrap (
Run) setsGHActionsRunner→ flag rendered. ✅ - Drift-update (
UpdateTartKubelet), fired on everytart-kubeletbinary roll, re-renders the plist from a Config that never setsGHActionsRunner(deliberately — it doesn’t reinstall the runner, and resolving the config would mint a fresh registration token each loop) → flag stripped. ❌
Since binaries stream independently of full bootstrap rolls (#11065), a host ends up with a fresh binary and a flag-less plist — exactly the wedged state.
Fix: add an explicit Config.DisableVMGC bool. renderLaunchdPlist renders the flag when GHActionsRunner != nil (bootstrap, unchanged) or DisableVMGC is true (update path). The reconciler sets DisableVMGC: machine.Spec.GHActionsRunner != nil on the update Config. Additive and backward-compatible; bootstrap path and its tests untouched.
2. Workflow cleanup — drop the nvram push workarounds
#11417 / #11419 chased the wrong layer, adding TART_NO_AUTO_PRUNE and a clone-the-base “restore nvram.bin” step to the push. With the GC reaping the whole bundle, the restore’s cp failed and the base-clone only added disk pressure. This reverts both push steps to a plain 3× retry — identical to macos-xcode-image.yml, which publishes the same-size image via the same path. Applied to the production release job and the dispatch workflow. (The runner-image-build job’s own TART_NO_AUTO_PRUNE is left alone.)
Verification
- Dispatch build on this branch’s workflow: pushed
tuist-xcresult-processoron attempt 1, NVRAM layer included. go build/go vet/go testclean ininfra/macos-host-bootstrapand the CAPI controller module; addedTestRenderLaunchdPlist_RendersDisableVMGCWhenSet.
Live hosts
The two current builders (…-rg4h9-xqzbl, …-rg4h9-kgp8s) were patched live and re-verified (flag present in plist + running process). With this merged, future binary rolls keep --disable-vm-gc instead of stripping it.
🤖 Generated with Claude Code
No GitHub comments yet.