Hive Hive
Sign in

Remote container builders

#69 · Public · Created directly

Compute
Proposed
Proposal

Summary

This RFC proposes a remote container build service for Tuist customers: a per-account BuildKit builder, running inside a runner microVM with a persistent per-account cache volume, that a customer ships docker build to. Builds run isolated like any runner job and stay warm across runs instead of recompiling from cold, the same value the managed remote-build services sell, on the fleet Tuist already operates. It is a new workload on the runner fleet, reusing its shapes, microVM isolation, token model, and billing ledger. The per-account cache volume this introduces is the foundation of a broader customer product: generic cache volumes that any runner job can mount at any path (SwiftPM, DerivedData, Gradle, Go), which for Tuist’s mostly-Apple customer base is arguably the higher-value feature. This RFC builds that foundation and makes the container builder its first consumer, so the generic product is additive on top rather than a rewrite.

Motivation

Container builds on standard CI are slow because nothing compiled survives between runs. CI runners are ephemeral, so every build starts with an empty cache; the usual workaround, a registry-backed layer cache, is coarse (one changed source file invalidates a whole stage), has to be pushed and pulled across the network every run, and never gives incremental compilation. Teams that build container images in CI pay this on every push.

Tuist already sells the two things that fix it, runners and cache, and owns the substrate to combine them: a compute fleet with a microVM isolation model that can safely run untrusted builds, local-NVMe storage, and the account and CLI relationship to meter and bill it. A remote builder is a native extension of that platform, not a new one, and it is a category customers already pay other vendors for. We feel the problem ourselves too: our own server image build runs cold on every deploy for exactly these reasons, which makes us a natural first tenant to validate the product against (see Rollout).

Current state (the substrate this builds on)

The runner fleet already provides everything the builder needs except the builder workload itself:

  • The runners-linux bare-metal pool (today caph-managed Hetzner Robot, planned to move to Scaleway Dedibox) with local-NVMe hosts.
  • The local-path-provisioner + scw-local-nvme StorageClass that carves local-NVMe PVs on those hosts (infra/helm/tuist/templates/kura-fleet-storage.yaml).
  • The runner isolation model in infra/runners-controller/AGENTS.md: each job in its own kata-fc microVM, a poller/runner split so the dispatch token never shares a container with customer code, one-shot lifecycle, the guest treated as adversarial.
  • The per-account primitives: Tuist.Runners.Catalog (shapes), Profiles (per-account label to shape), Claims (per-account concurrency), and RunnerSessions (append-only billing intervals).

The remote builder is a new workload on top of all of it, the same way the sandbox API (spec #2) is.

Proposal

A remote build is a runner-fleet workload: BuildKit running inside a kata-fc microVM, scheduled on the shared Linux pool, with the requesting account’s cache volume attached. The microVM is the tenant boundary; the volume is the warmth. Nothing is shared across accounts except the underlying capacity.

The builder workload

A build request schedules a Pod with runtimeClassName: kata-fc (the same isolation as runners) running buildkitd on a Linux-pool node. The account’s cache volume, a scw-local-nvme PV holding /var/lib/buildkit (layer cache plus cache mounts), is attached at boot. Compute is ephemeral and recyclable; persistent state lives entirely in the volume, so a dead builder Pod costs a cold rebuild for that account, never correctness or another account’s data.

Reaching and driving it

The customer points docker buildx create --driver remote (or a tuist CLI wrapper) at a brokered per-account endpoint, the same server-side TCP broker pattern the sandbox API uses for SSH (brokers.tuist.dev, keyed by account plus a short-lived per-build token). Authentication is the account agent token; Tuist.Authorization enforces the feature flag, plan, and quota. Pod networking stays internal; no per-build public IPs. The customer’s build ships its context to the builder, which does the pull, compile, and push, so nothing about their local machine or CI runner needs to be fast.

Because /var/lib/buildkit persists in the account’s volume, the customer’s layer cache and --mount=type=cache mounts survive across builds: an unchanged stage is a warm hit instead of a cold rebuild. We provide the warm volume and never touch their Dockerfile, so build correctness is exactly that of local BuildKit.

Cache affinity and warm pool

The account’s cache volume is the affinity unit. A local-NVMe PV pins to one node, so an account’s builds schedule to where its volume lives (or the volume reattaches on schedule). A small warm pool of pre-booted builder microVMs per shape keeps build start in seconds, with cold boot as the fallback, the same warm-pool shape the sandbox API uses.

Lifecycle and billing

A build moves pending -> ready (builder up, volume attached) -> building -> idle/terminating, reaped on completion, deadline, or idle. Each build records an interval in a RunnerSessions-style ledger; per-account concurrency runs through Claims; the cache volume accrues a separate storage line while it persists, the same split the sandbox API uses for suspension snapshots. Quotas and concurrency caps are per account.

Security

The model is the runner model, not a new one. This is the core reason the design is a per-account microVM builder and not a shared buildkitd daemon.

  • Isolation: one kata-fc microVM per build, guest treated as adversarial, identical to runners. There is no shared build daemon across tenants.
  • Cache isolation: each account’s cache is its own volume and its own BuildKit store. No cross-tenant content-addressed dedup oracle, and cache poisoning is bounded to the account that caused it.
  • Tokens: the account agent token mints a job-scoped build credential that never shares a container with the build, mirroring the runner poller/runner split. The broker token is per build and short-lived.
  • Build secrets and registry credentials are scoped per build, host-custody where the broker allows rather than guest-custody.
  • Egress is contained per account, reusing the stable-egress controls.

Cache volumes as a product, and this builder as their first consumer

The per-account cache volume here is deliberately built as a general mechanism, not a builder-specific one: a named per-account scw-local-nvme PVC attached to a fleet job at a declared path, with the attach, affinity, quota, eviction, and billing machinery living in the runners-controller. The container builder is simply its first consumer, mounting it at /var/lib/buildkit.

That same mechanism is a customer product in its own right, and for Tuist’s audience probably the larger one. Most customers build Xcode (and some Android), not Linux containers, so their dominant CI cost is caching SwiftPM/DerivedData/Mint/Homebrew or Gradle across runs, today paid as slow actions/cache tarball round-trips. A persistent cache volume mounted straight into the runner job replaces that for the majority of the base. So this RFC is not only unblocking container builds; it lays the foundation for “mount any named cache at any path into any runner job,” with the builder proving the volume lifecycle on a single, well-behaved consumer first.

The layering keeps that future additive. Ship the volume single-writer first (one job holds it at a time; concurrent jobs queue or fall back to cold), which is the useful common case and the genuinely simple foundation; container builds consume it as-is, since BuildKit serializes its own store. The hard part, concurrent warmth via copy-on-write snapshots or per-branch volumes (the bar a polished cache-volume product eventually meets), is deferred for both the builder and the generic product, gated on demand. This RFC builds only what the persistent-build case needs and does not specialize the mechanism to it, so exposing it for arbitrary jobs later is new surface, not a redesign.

Scope

In scope:

  • Remote container build as a customer-facing fleet workload: a kata-fc microVM builder on the Linux pool with a per-account cache volume.
  • The per-account cache-volume attach mechanism (named PVC on scw-local-nvme; affinity, quota, eviction, billing), built generic with the builder as first consumer.
  • The brokered per-account endpoint, account-token auth, per-account quota and concurrency, per-build billing, and the CLI surface customers use to target the builder.

Out of scope:

  • Exposing the cache-volume mechanism as a user-facing “mount any cache at any path into any job” feature. The mechanism is built generic here and is a planned customer product in its own right (see “Cache volumes as a product”); surfacing it for arbitrary jobs, and the copy-on-write concurrency a polished version needs, is an additive follow-on, not part of this RFC.
  • The mechanics of moving Tuist’s own deploy build onto the service. That is internal dogfooding (see Rollout), not part of the feature.
  • macOS/Xcode remote builds. Linux amd64 first; the same model extends to Tart VMs later.
  • Multi-arch images. amd64 first.
  • BYOC or self-hosted builders.

Trade-offs

Advantages

  1. Warm, cached container builds for customers on infrastructure we already own, sold alongside runners and cache as one platform, a category customers already pay other vendors for.
  2. No new security model: the builder inherits the runner microVM isolation and token split, so multi-tenancy is as safe as runners.
  3. Reuses the whole Runners substrate (Catalog, Profiles, Claims, RunnerSessions, the broker), so the net-new surface is the builder workload plus the cache-volume mechanism.
  4. Builds the cache-volume mechanism once, generically, so the broader cache product is additive.

Disadvantages

  1. A new stateful per-account volume class to operate: quota, eviction, capacity planning, and node-affinity scheduling on local NVMe.
  2. Cache affinity couples an account’s builds to the node holding its volume; node loss is a cold rebuild for that account.
  3. Warm-pool capacity is idle cost, the same tradeoff the sandbox API carries.
  4. Multi-tenant blast radius is the microVM. Accepted, because it is exactly the boundary runners already trust.

Alternatives considered

A shared buildkitd daemon with namespaced cache

One long-lived daemon that many accounts’ builds hit, separated by cache-mount id. Rejected on security: buildkitd is a privileged daemon running arbitrary RUN steps and is not a hardened tenant boundary. A namespaced id is a string, not a kernel boundary, so it admits cache poisoning into another account’s image, a content-addressed dedup existence oracle across tenants, and shared exposure of in-flight build secrets. It also contradicts the runner posture (long-lived, shared kernel, shared state) we rely on everywhere else. The microVM-per-build model gives the warmth without the shared daemon.

Ephemeral builder with no persistent cache

Safe (isolated per build) but cold every time, which defeats the entire purpose. Rejected. The persistent per-account volume is the value.

One dedicated always-on builder per account

Simple isolation, but idle builders cost money and a low-volume account’s cache goes stale between builds. Rejected in favor of a shared capacity pool with a per-account warm volume, sticky scheduling to keep heavy accounts hot, and recycled compute for the long tail.

Registry-backed layer cache, the status quo

Tell customers to keep using a registry cache. Rejected as the product: it is coarse (no incremental compilation), pays a network push and pull every run, and is exactly the slow path this service exists to beat.

A managed third-party remote builder

Resell an external build service. Rejected: a vendor dependency and recurring cost for a capability we can run on the fleet we already own, and it forecloses the product we are actually trying to build.

Rollout

Internal dogfood (tenant zero). Run Tuist’s own image builds on the service under the tuist account as the first tenant, to exercise the builder, cache volume, broker, and billing end to end before any external account. A single trusted account does not exercise cross-tenant isolation, but the workload is built on the microVM primitive from day one, so opening it to customers enables that surface rather than retrofitting it. How Tuist’s CI is wired to the service is an internal detail, not part of this feature.

Private beta. Enable for a small set of accounts behind a per-account feature flag. Validate per-account isolation (no cross-account cache visibility), cache-volume quota and eviction, billing intervals, and quota enforcement.

GA. Warm-pool tuning, pricing surface, tuist CLI integration, and broader shapes.

Open questions

  1. Cache-volume affinity: pin an account’s builds to the node holding its local-NVMe volume, or support reattach/replication so a build can schedule elsewhere? Node loss is a cold rebuild either way; the question is scheduling flexibility versus simplicity.
  2. Warm-pool sizing per shape and the cold-start target for a build with no warm builder.
  3. Cache-volume quota, eviction policy, and the storage pricing line.
  4. CLI surface: raw docker buildx --driver remote with documented auth, a tuist wrapper, or both, and how buildx authenticates to the broker.
  5. macOS/Xcode remote builds on the same model with Tart VMs, or a separate effort.
  6. Multi-arch: an arm64 builder or emulation.
  7. Where the general cache-volume attach API lives so the later “any cache, any path, any job” product consumes it cleanly.
  8. Relationship to Tuist’s existing binary build cache: these are different layers (artifact cache versus container-build cache); the messaging needs to keep them distinct.

References

  1. Sandbox API RFC, the sibling workload-on-the-fleet pattern: spec #2
  2. Runner fleet, microVM isolation, token split, dind sidecar: infra/runners-controller/AGENTS.md
  3. Local-NVMe StorageClass: infra/helm/tuist/templates/kura-fleet-storage.yaml
  4. BuildKit remote driver and cache: https://docs.docker.com/build/builders/drivers/remote/ and https://docs.docker.com/build/cache/
Draft history
Revision Status Edited
Revision 4 Edited by marek@tuist.dev
Proposed
Revision 3 Edited by marek@tuist.dev
Proposed
Revision 2 Edited by marek@tuist.dev
Proposed
Revision 1 Edited by marek@tuist.dev
Proposed
Comments

No comments yet

Comments from contributors and members will appear here.

Sign in to comment

Comments are available to authenticated users.