This patch fixes the macOS runner queue bottleneck that left CI jobs Pending for minutes even when nodes were free and healthy.
The root cause was the runner image rollout: when a new image digest rolled out, every node tried to pull several gigabytes at once, collapsing the warm pool. Now image rolls are paced, so only a limited percentage of the fleet updates at a time, and more pods are only drained after rollers finish warming up.
A secondary stall came from tart-kubelet, the macOS VM provisioning agent. A single slow or hung tart pull/clone/run/delete could wedge the whole node’s provisioning loop while the node still reported Ready. Per-operation timeouts and retry now self-heal those stalls, and the agent requeues reconciliation after adding finalizers so missed watch events can’t strand pods.
Visibility also improved: new rollout metrics show how many pods are rolling, how many are stale, and the concurrency cap, and a new tart_kubelet_pod_provision_delay_seconds metric plus CreatingVM event make the gap between Scheduled and Running visible.
Docker image: ghcr.io/tuist/capi-provider-scaleway-applesilicon:0.10.1