Hive
feat(infra): ship the kura cache StorageClass with the chart
GitHub issue · Closed
Draft — opening for review of the approach before merge.
Problem
The scw-local-nvme StorageClass + local-path-provisioner that the kura runner-cache PVCs bind against was a manual, per-workload-cluster bootstrap (infra/k8s/mgmt/bootstrap/local-path-provisioner.yaml) with zero automation:
mgmt-cluster-apply.ymldoesn’t apply it — that workflow targets the management cluster and only applies a fixed allowlist (infra/k8s/clusters/**+ four specific mgmt files);infra/k8s/mgmt/bootstrap/**is excluded.- No other workflow applies it. The file’s header literally says “apply with
kubectl apply -f …”.
So a cluster could enable the kura fleet but silently lack the StorageClass its cache PVCs need. That’s exactly what happened on production during the macOS-runner-cache cutover: the EM node joined and the kura-tuist-scw-fr-par KuraInstance + pod were created, but the pod hung Pending on an unbound scw-local-nvme PVC (FailedScheduling: pod has unbound immediate PersistentVolumeClaims. not found).
Change
Move the provisioner into the tuist chart as templates/kura-fleet-storage.yaml, gated on the same kuraFleet.enabled flag as the fleet itself, and delete the standalone bootstrap manifest. Now the StorageClass ships wherever the fleet that needs it ships — they can’t drift, and a fresh env can’t miss it.
The StorageClass name scw-local-nvme is load-bearing (the scw-fr-par-runners kura region spec provisions PVCs with exactly that name) and is unchanged.
One behavior change worth a look
The bootstrap manifest pinned the provisioner controller to the EM pool with a nodeSelector. I removed that in the chart version: the controller only watches PVCs and launches a helper Pod on the target node (which still tolerates the runner-cache taint), so it doesn’t need to live on the cache node. Pinning it would make the release helm --wait block on the controller being Available whenever the EM node isn’t up yet (e.g. a cold deploy), which is a regression we don’t want. Flagging it explicitly since it diverges from the validated bootstrap.
Validation
helm templaterenders all 7 resources (Namespace, SA, ClusterRole/Binding, Deployment, StorageClass, ConfigMap) whenkuraFleet.enabled, and nothing when disabled.- Deployment renders with no
nodeSelector(unpinned); StorageClassscw-local-nvmepresent.
Note
This doesn’t retro-fix the live clusters — staging already has the bootstrap applied; production still needs the one-time kubectl apply (or this merged + redeployed) to unstick the currently-Pending cache pod.
🤖 Generated with Claude Code