Hive Hive
Sign in

Self-heal tart-kubelet VM provisioning stalls and improve observability

Metadata
Source
GitHub
Version
runners-controller@0.13.1
Domains
Published
Jun 19, 2026 · 10:51 UTC
Repository
tuist/tuist
Update

macOS runner Pods could sit Pending for minutes on healthy nodes when a slow or hung tart call blocked the node agent. The node agent now applies per-operation timeouts to pull, clone, set, stop, delete, get, and list, so stuck operations are killed and retried. It also requeues explicitly after adding a finalizer so a missed watch event cannot strand a Pod. A new tart_kubelet_pod_provision_delay_seconds histogram and a CreatingVM Pod event make the Scheduled-to-Running gap visible.