The fix to stream bootstrap binaries instead of copying them per roll is now available in capi-scaleway@0.8.0. This resolves the OOMKills (exit code 137) caused by transient string copies during fleet-wide rolls. Update to this version to eliminate the memory pressure without needing to raise resource limits.
Hive
fix(infra): stream bootstrap binaries instead of copying them per roll
GitHub issue · Closed
What changed
RunCommandWithStdin in infra/macos-host-bootstrap now takes an io.Reader instead of a string. The four bootstrap binaries (tart-kubelet, tart, tailscale, node_exporter) are streamed to the host with bytes.NewReader over the operator’s already-resident slice instead of string(binary), which copied the whole binary on every call. String payloads (kubeconfig, launchd plist, auth keys, runner token) pass strings.NewReader.
No resource-limit change — the limit is left at the original 512Mi.
Why
The capi-scaleway-applesilicon manager was OOMKilling in production (Exit Code: 137), both replicas ping-ponging the leader lease and firing a Pod CrashLoop / Frequent Restarts alert. This removes the actual cause rather than giving it more headroom.
Root cause
It’s an OOM, not an application error — logs show a clean startup with no panic; the process is killed mid-reconcile.
The manager reads the four macOS bootstrap artifacts into memory once at startup and holds them resident for the process lifetime (~96MB total: tart-kubelet 47MB + tart 22MB + tailscale 14MB + node_exporter 13MB). That part is deliberate. The problem was the transfer path: each host install called RunCommandWithStdin(..., string(binary)), and the []byte → string conversion allocates a fresh copy of the whole binary on every call.
When the embedded tart-kubelet SHA changes — which is exactly what the capi-scaleway@0.7.2 deploy did, since #11044 shipped a new tart-kubelet binary — every ScalewayAppleSiliconMachine shows binary drift and re-rolls at once. With --machine-max-concurrent-reconciles=4, that’s up to four concurrent 47MB copies on top of the ~96MB resident set, and default GOGC pacing lets the heap overshoot before a collection runs:
~96MB resident
+ 4 × 47MB transient string copies (concurrent drift rolls) ≈ 188MB
+ GC headroom (GOGC=100 → heap grows ~2× live before collecting)
= peak past 512Mi during a fleet-wide roll
Already-Ready machines skip the full bootstrap, so this isn’t generic reconcile pressure — it’s specifically the binary-SHA drift roll fanning across the fleet, which recurs every time the embedded tart-kubelet changes.
Why this solution over raising the limit
The earlier revision of this PR bumped the limit to 1Gi + added GOMEMLIMIT. That treats the symptom. The transient copies are pure waste: bytes.NewReader streams straight from the resident slice with zero extra allocation, so the burst disappears entirely. bytes.Reader only reads and never mutates the backing array, so the concurrent rolls share the one resident slice safely. With the copies gone, the working set during a fleet-wide roll stays near the ~96MB baseline — comfortably inside the original 512Mi, no headroom bump needed.
Impact
No customer-facing outage during the incident — the Mac mini nodes kept running (independent kubelets, all Ready). But the fleet control plane was degraded: machine provisioning/recovery, fleet-spread auto-roll, and node-readiness reconciliation flapped. This fix lands in the manager image, so it takes effect once a new capi-scaleway release is built and deployed (not on a plain Helm value roll).
Validation
go build/go vet/go testpass formacos-host-bootstrap.- The
cluster-api-provider-scaleway-applesiliconconsumer still builds.
🤖 Generated with Claude Code