feat(infra): docker-in-runner for Linux pools

GitHub issue · Closed

Open on GitHub

Metadata

Source

tuist/tuist #10905

Updated

Jun 24, 2026

Domains

Compute

Details

Summary

Add docker-in-runner support for the in-house Linux self-hosted GitHub Actions fleet. Linux runner Pods now ship with a privileged docker:dind native sidecar so workflows that need services: containers, docker build, or buildx run natively on our fleet instead of staying on Namespace.

Validated end-to-end on staging: the server build’s docker_build job ran green on tuist-staging-linux after this PR was deployed.

Why

The Linux runner fleet already wraps every Pod in a kata-qemu microVM. That isolation makes it safe to grant a sidecar container privileged: true — the privileged surface is bounded by the per-Pod kernel, not the bare-metal host’s. Without docker-in-runner, jobs that need dockerd have to stay on Namespace runners, fragmenting the CI fleet and paying Namespace for the dockerd they get out of the box.

What changed

runners-controller

dind native sidecar. podtemplate.Build emits a dind init-container with restartPolicy: Always (k8s ≥ 1.29 native sidecar) running the upstream docker:dind image. Sidecar is privileged: true; runner container stays unprivileged. startupProbe: docker info blocks the runner from starting until dockerd is reachable. Mirrors the actions-runner-controller gha-runner-scale-set pattern.
Memory limit on the runner container. kata sizes the microVM from the sum of container memory limits, not requests. Setting Limits == Requests on the runner makes the VM the size we asked for.
Shared volumes.
- dind-sock emptyDir at /var/run (both containers) exposes the docker socket.
- work emptyDir at /home/runner/actions-runner/_work (both containers) makes docker run -v $PWD:/x mounts resolve identically on either side.
- dind-storage emptyDir with medium: Memory at /var/lib/docker on the sidecar only — tmpfs is the only filesystem inside the microVM that satisfies both overlay2’s xattrs and BuildKit’s checksum-time xattr reads. Disk-backed emptyDir is served through virtio-fs, which forces dockerd onto the vfs driver and then trips buildx anyway.
nr_inodes=0 tmpfs remount. kubelet’s medium: Memory emptyDir inherits the kernel default of ~1 inode per 16 KiB; an npm node_modules tree exhausts that long before the byte cap. The sidecar wraps dockerd in sh -c "mount -o remount,nr_inodes=0 /var/lib/docker && exec dockerd ...".
--dind-image manager flag + runnersController.dindImage chart value pin the sidecar image (default docker:28-dind, Renovate-bumped).

Linux runner image

Adds docker-ce-cli, docker-buildx-plugin, docker-compose-plugin from the official Docker apt repo.
Pins the docker group GID to 123 so the runner user can reach the sidecar’s socket without sudo (sidecar runs dockerd --group=123).
No daemon in this image — the sidecar carries it. dispatch-poll.sh is unchanged from the pre-PR shape.

Helm chart

New runnersController.dindImage value, threaded to the manager as --dind-image.
runnerpool.yaml template renders the Linux pool with runtimeClass: kata-qemu required.
Staging values pinned at the validated 48 GiB pod budget. Linux runner image pinned at sha-009aa3f528ee.

Server Deployment workflow (defensive fixes)

Bumped Build + push timeout 40m → 60m (the buildx driver swap from PR #10886 removed Namespace’s persistent build cache).
Added registry-backed BuildKit cache to ghcr.io/tuist/tuist-buildcache:server (mode=max) so first-built layers get reused on follow-up deploys.
Pinned helm@3.16.3 kubectl@1.31.3 in mise install args and added mise use --global so the shims actually resolve. Set MISE_GITHUB_ATTESTATIONS: 0 to stop burning the runner user’s GitHub API quota on rerun-heavy days.
mise.toml: helm 4.2.0 → 3.16.3 because aqua’s helm provider builds the download URL without the v prefix and 4.x 404s upstream. Revert when aqua’s registry catches up.

Validation

Server canary 26409481185 — all 9 jobs green including Docker build on tuist-tuist-runner-pool-linux-ubuntu-22-04-runner-0bfc3d76, build wall time 14 min on the staging fleet.

Iteration history (for posterity)

operation not supported on /noora/priv/static — fixed by tmpfs /var/lib/docker.
dockerd EOF / Cannot connect to the Docker daemon — kata defaulted the VM to 2 GiB; fixed by setting memory limits so kata sizes from them.
mix compile ... cannot allocate memory — bumped pod 8 → 16 → 32 → 48 GiB.
no space left on device during npm install — tmpfs default inode count exhausted; fixed by nr_inodes=0 remount in sidecar entrypoint.

Performance vs. Namespace

This PR gives workflows dockerd, not Namespace-parity build speed. Each Pod is single-shot — the tmpfs at /var/lib/docker and any RUN --mount=type=cache mounts vaporize when the job exits. Base image pulls (swift:6.2-bookworm ~1 GB, node:22-slim ~150 MB) and mix deps/aube install/swift toolchain steps run cold on every job.

The deploy workflow’s Build + push already gets a registry-backed BuildKit cache (ghcr.io/tuist/tuist-buildcache:server, mode=max) added in this PR. With mode=max the RUN --mount=type=cache mounts (mix/hex, aube, npm, swift toolchain) are exported and re-imported, so mix deps and npm install don’t re-download every build — the same effect as Namespace’s persistent disk cache, just over the network.

What we still pay every build that Namespace doesn’t:

FROM image pulls against external registries (swift:6.2-bookworm ~1 GB compressed, node:22-slim ~150 MB). On Namespace these are a local NVMe read; for us they’re a fresh network pull. ~2-3 min.
Pulling the BuildKit cache manifest + blobs from ghcr.io/tuist/tuist-buildcache:server. ~1-2 min depending on the working set.

CPU is not an advantage for us at the current settings: pod.cpuMilli: 2000 gives each runner microVM 2 vCPU vs. Namespace’s namespace-profile-default-with-volume 4 vCPU. The bare-metal AX42-U has 8 cores but we only allocate 2 of them per Pod; the rest sit idle (the original choice was driven by RAM density, not CPU). Bumping pod.cpuMilli is a cheap follow-up if the build turns out CPU-bound rather than I/O-bound.

Realistic shape:

First-ever cold build: 14 min (validated — see Validation above).
Warm build with the registry cache populated: expect a few minutes less, dominated by the base-image + cache-pull tax (image pulls + cache layer fetch are mostly serial network ops, not parallel CPU work).
Namespace warm: ~15 min, no cache-pull tax.

The follow-up items below (node-local pull-through registry mirror, kata direct-volume for /var/lib/docker) close the cache-pull gap; bumping cpuMilli closes the CPU gap if it turns out to matter.

Follow-up ideas, in order of bang-for-buck:

Node-local pull-through registry mirror. Run a distribution registry or dragonfly/spegel on each bare-metal node; configure dockerd’s registry-mirrors to hit it first. Wipes the base-image pull cost across Pods on the same host. Low complexity, biggest single-step win.
Real disk for /var/lib/docker via kata direct-volume. Mount a node-disk local PV into the dind sidecar at /var/lib/docker using kata’s direct-volume feature — bypasses virtio-fs (the reason we’re on tmpfs today) and gives real ext4 inside the microVM with full overlay2 + xattr support. This is also the correctness/portability fix: with real-disk storage, workflows can use the default docker-container buildx driver and skip every tmpfs-related size/inode/driver: docker workaround. Same shape as GitHub-hosted / Namespace; an action that works there works here, no modifications. BuildKit state surviving across jobs is the perf bonus. Higher complexity; needs per-node block device provisioning.
Shared remote BuildKit instance. Run buildkitd as a per-host DaemonSet outside the runner Pods; workflows connect via docker buildx create --driver remote tcp://<node>:1234. Single shared cache state; needs auth + tenant isolation if we ever onboard customer workloads.
Larger Pods so the in-VM cache can hold more. Cheapest knob but only delays the cold-pull problem.

For now: the dind sidecar unblocks the migration off Namespace for correctness (services: containers + docker build work), and registry buildcache gets us partial parity for buildx layer reuse. Closing the rest of the gap is the cache-warming work above.

Post-merge cleanup

Flip the Docker build CI job onto our own runners. This PR leaves docker_build on namespace-profile-default-with-volume (unchanged from main) so the merge isn’t gated on a self-hosted build running against a shared staging cluster that other branches concurrently deploy to. The dind sidecar + the tuist-staging-linux-large pool are validated and in place; the follow-up flips runs-on: to tuist-staging-linux-large (and bumps the job timeout for the bare-metal fleet’s cold-cache tax) once the controller + pool config is deployed to prod via the release pipeline and the env is no longer contended. That’s the real migration off Namespace for the server image build.
Remove the runnersController.features.dindImage gate. The flag exists only as a transition safety: canary / production still pin controller 0.3.0, which doesn’t register --dind-image, so an unguarded chart upgrade would crash flag.Parse. Once the release pipeline ships a controller version that includes this PR’s binary changes and bumps runnersController.image.tag in values-managed-common.yaml, every env has a compatible binary and the gate is dead weight. Follow-up PR removes the gate entirely (template line ungated, the value drops from values.yaml + values-managed-staging.yaml).

Comparison to actions-runner-controller

The pod shape is taken from ARC’s gha-runner-scale-set chart almost verbatim. Concrete choices we lifted (each subtle enough that getting it wrong silently breaks something):

k8s ≥ 1.29 native sidecar (initContainer + restartPolicy: Always) with startupProbe: exec docker info, which replaces what would otherwise be a polling loop in the runner’s entrypoint.
DOCKER_GROUP_GID=123 env on the sidecar + --group=123 on dockerd + a docker group pinned to GID 123 in the runner image. The runner reaches the socket without sudo only when all three agree.
work emptyDir mounted at the same path in both containers (/home/runner/actions-runner/_work for us) so docker run -v $PWD:/x resolves identically on either side.
dind-sock emptyDir at /var/run for the socket itself.

Where we diverge from ARC, and the reasoning:

Substrate. ARC runs on regular cloud-VM Kubernetes nodes. Their /var/lib/docker goes on the node’s disk via overlay2 and can be a PVC for cross-job cache. We’re on bare-metal nodes with kata-qemu microVMs underneath, and virtio-fs can’t carry overlay2’s xattrs — that’s the entire reason our /var/lib/docker lives on tmpfs with the nr_inodes=0 remount. The follow-up “kata direct-volume” item in the Performance section is what would close that gap and make our shape match ARC’s.
containerMode: dind vs containerMode: kubernetes. ARC offers a second mode that translates the workflow’s container:/services: blocks into sibling Kubernetes Pods on the cluster — no privileged surface anywhere, real PVC for the work volume. The cost: no docker build. We picked dind because the server image needs docker_build. Worth knowing exists for any future runner pool that only needs services:.
docker:dind-rootless variant. Same image, no privileged container. Compelling for ARC users on shared multi-tenant clusters. Less relevant for us because kata-qemu already bounds the privileged blast radius to a microVM.
Queue/dispatch model. ARC ships a runner-scale-set-listener Pod that pulls GitHub queue depth and scales the pool. We do the equivalent in-process via the Tuist server’s dispatch_for_sa. Same outcome; different code path.

What’s reassuring: ARC documents the inter-job cache loss in dind mode as a known limitation and points users at exactly the two answers we use (type=registry/type=gha BuildKit cache exporters) plus containerMode: kubernetes for workloads that don’t need docker build. The places this PR improvises (tmpfs + remount + 48 GiB pod) are kata-substrate workarounds, not improvements over ARC — and the Performance follow-ups move us toward ARC’s standard shape, not away from it.

One ARC user-facing pitfall worth knowing because it affects us too: containers started inside a workflow are siblings of the dind sidecar, not children of the runner container. Workflows that docker network create and try to attach the runner container itself to the new network don’t work the way users expect — the runner is in the Pod’s network namespace, not the dind sidecar’s docker bridge.

Test plan

go test ./... in infra/runners-controller/.
helm lint clean on staging/canary/production values overlays.
Staging deploy via Server Deployment workflow.
docker_build job ran green on tuist-staging-linux during canary validation (the runs-on flip itself is a follow-up PR after this lands).
Reviewer: confirm the staging-only image pins look right; the common runnersController.image.tag will catch up via the release pipeline post-merge.

🤖 Generated with Claude Code

Comments

tuist[bot] May 26, 2026

🛠️ Tuist Run Report 🛠️

Tests 🧪

Scheme	Status	Cache hit rate	Tests	Skipped	Ran	Commit
TuistAcceptanceTests	❌	0 %	0	0	0	7be9643dc

Builds 🔨

Scheme	Status	Duration	Commit
TuistAcceptanceTests	✅	4m 30s	a6926569a

tuist-atlas[bot] Jun 3, 2026

The docker-in-runner feature for Linux pools is now available in runners-controller@0.6.0. Update to this version to use the new capability on your Linux runner pools.

tuist-atlas[bot] Jun 4, 2026

The docker-in-runner feature for Linux pools is now available in linux-runner-image@0.2.0 (ghcr.io/tuist/tuist-linux-runner:0.2.0). Update to use the new functionality.