Hive Hive
Sign in

fix(infra): unwedge the xcresult-processor image release (tart-kubelet GC + workflow cleanup)

GitHub issue · Closed

Metadata
Source
tuist/tuist #11428
Updated
Jun 25, 2026
Domains
Compute
Details

Combines the two follow-ups to the xcresult-processor image release wedge into one PR (supersedes #11427).

Background

The xcresult-processor push failed ~10× in a row with Error: The file "nvram.bin" doesn't exist. The cause was not the workflow: the builder-fleet hosts run tart-kubelet alongside the GitHub Actions runner, and tart-kubelet’s orphan-VM GC was reaping the in-flight Packer build VM mid-tart push. The fix flag --disable-vm-gc was added in #11072, but the running builders had a stale, flag-less plist. I patched both live builders out-of-band to unblock the release (verified: a dispatch build then pushed green on attempt 1).

This PR makes the fix durable and cleans up the workflow.

1. Durability — stop binary rolls from stripping --disable-vm-gc

renderLaunchdPlist keys the flag off cfg.GHActionsRunner != nil:

  • Bootstrap (Run) sets GHActionsRunner → flag rendered. ✅
  • Drift-update (UpdateTartKubelet), fired on every tart-kubelet binary roll, re-renders the plist from a Config that never sets GHActionsRunner (deliberately — it doesn’t reinstall the runner, and resolving the config would mint a fresh registration token each loop) → flag stripped. ❌

Since binaries stream independently of full bootstrap rolls (#11065), a host ends up with a fresh binary and a flag-less plist — exactly the wedged state.

Fix: add an explicit Config.DisableVMGC bool. renderLaunchdPlist renders the flag when GHActionsRunner != nil (bootstrap, unchanged) or DisableVMGC is true (update path). The reconciler sets DisableVMGC: machine.Spec.GHActionsRunner != nil on the update Config. Additive and backward-compatible; bootstrap path and its tests untouched.

2. Workflow cleanup — drop the nvram push workarounds

#11417 / #11419 chased the wrong layer, adding TART_NO_AUTO_PRUNE and a clone-the-base “restore nvram.bin” step to the push. With the GC reaping the whole bundle, the restore’s cp failed and the base-clone only added disk pressure. This reverts both push steps to a plain 3× retry — identical to macos-xcode-image.yml, which publishes the same-size image via the same path. Applied to the production release job and the dispatch workflow. (The runner-image-build job’s own TART_NO_AUTO_PRUNE is left alone.)

Verification

  • Dispatch build on this branch’s workflow: pushed tuist-xcresult-processor on attempt 1, NVRAM layer included.
  • go build/go vet/go test clean in infra/macos-host-bootstrap and the CAPI controller module; added TestRenderLaunchdPlist_RendersDisableVMGCWhenSet.

Live hosts

The two current builders (…-rg4h9-xqzbl, …-rg4h9-kgp8s) were patched live and re-verified (flag present in plist + running process). With this merged, future binary rolls keep --disable-vm-gc instead of stripping it.

🤖 Generated with Claude Code

Comments

No GitHub comments yet.