Hive Hive
Sign in

feat(server): move runners to the tuist-linux label and default the profile to 2 vCPU / 8 GB

GitHub issue · Closed

Metadata
Source
tuist/tuist #11052
Updated
Jun 24, 2026
Domains
Compute
Details

Overview

Three related changes that retire the per-env tuist-production-linux label in favor of tuist-linux and set a sensible default shape for it:

  1. Workflows move every active CI job from runs-on: tuist-production-linux to runs-on: tuist-linux (38 jobs) and update the parked-job TODO comments (25).
  2. Server removes the now-dead tuist-production-linux legacy alias path in dispatch.
  3. Default shape for the Linux runner profile changes from 4 vCPU / 16 GB to 2 vCPU / 8 GB, across the dev/test catalog and the three managed Helm envs.

The bulk of the discussion below is about (3), since that is the consequential product decision.


1. Workflow label move

tuist-linux is the production runs-on: label for the protected linux default profile (production prefix tuist- + profile name linux). tuist-production-linux was a transitional alias. This repoints all active jobs and updates the TODO comments on jobs still parked on namespace-profile-* runners (blocked on the runner image shipping a docker daemon); those jobs stay put, only the destination label name is updated.

2. Legacy alias removal

With no workflow referencing tuist-production-linux anymore, the @legacy_linux_label_by_env alias guards nothing. It was the only alias-only label: the canary/staging labels are identical to the protected linux profile’s label on those envs, so they already resolve via the profile path. Every account has that profile (backfill migration 20260602103000 for existing accounts, Profiles.create_default_for_account for new ones), so dispatch resolution collapses to profile then legacy pool. The legacy pool path stays: it is the live resolution path for the macOS fleet (tuist-macos), not dead code.

3. Default Linux profile shape: 4/16 to 2/8

The question

The linux default profile is what most jobs inherit, so its shape is the one that matters. The prior default was 4 vCPU / 16 GB. The question was whether a leaner default is better, and if so, what.

Competitor landscape

Default / entry-level Linux runner specs:

Provider Default tier vCPU RAM RAM per vCPU
GitHub Actions (private repos) ubuntu-latest 2 8 GB 4 GB
GitHub Actions (public repos) ubuntu-latest 4 16 GB 4 GB
GitLab SaaS (default) saas-linux-small 2 8 GB 4 GB
CircleCI (default) medium 2 4 GB 2 GB
Blacksmith / BuildJet / Ubicloud entry 2 8 GB 4 GB
Namespace (bare nscloud) default 4 4 GB 1 GB

The dominant convention is 4 GB per vCPU, i.e. 8 GB at 2 cores. Namespace badges its 2 GB/vCPU shapes as “optimal”, but that label tracks Namespace’s own bare-metal ratio (a supply-side, hardware-relative signal), and even Namespace defaults its profile editor to a more generous 4 GB/vCPU shape. “Optimal” is hardware-relative, not a universal statement of what jobs need.

Why 2/8

  • Migration parity and OOM safety. 2 vCPU / 8 GB is a byte-for-byte match to GitHub’s private ubuntu-latest, the like-for-like target for jobs moving off GitHub-hosted runners. The 8 GB floor meets the expectation it came from. A leaner 2/4 default (matching CircleCI / Namespace’s “optimal” ratio) risks OOM on memory-heavy jobs that pass on a stock GitHub runner.
  • It fits the bare metal we already run. Production runner pods land on Hetzner AX162-R hosts (96 threads, 256 GB, so 2.67 GB per thread). At 4 GB/vCPU, 2/8 is memory-bound at 32 runners/host, using 64 of 96 threads. The idle third of CPU is not stranded reservation, it is burst headroom. This is the same 4 GB/vCPU balance the chart’s reference shape already used, so no infra change is needed.
  • It matches how the scheduler actually packs. The runners autoscaler bin-packs on memory only (Kata pins memory per sandbox; CPU is deliberately oversubscribed, see internal/scaling/allocate.go). For a memory-bound shape (>= 2.67 GB/vCPU) the memory-based replica count equals what the kube-scheduler can place, so everything lines up. A sub-2.67 GB/vCPU default (like 2/4) would break that: the memory-only allocator would over-ask and leave pods Pending on hosts that still have free RAM. Defaulting to 2/8 avoids needing to teach the allocator about CPU.
  • It is cheaper than the old default. Versus 4/16 it doubles density (16 to 32 runners/host) and roughly halves the metal cost per concurrent runner, while still clearing the 8 GB floor.

Alternatives considered

  • 2 vCPU / 4 GB (the leaner option): packs more per host (48 vs 32) and fits the AX162-R ratio more tightly, but flips the bottleneck to CPU, is OOM-tight on memory-heavy jobs, and would require allocator changes to avoid Pending pods. Rejected for a default; still available as an opt-in shape.
  • Ordering more host RAM to pack 2/8 perfectly (384 GB for 4 GB/thread): rejected on cost. Hetzner prices RAM upgrades at a premium (around 66 EUR/month per 32 GB), so +128 GB roughly doubles the server’s monthly cost for +50% slots. The break-even is a base host price above ~528 EUR/month; the AX162-R is far below that, so scaling out with stock 256 GB boxes is cheaper per runner. The stock box is well-matched to a mixed fleet and only mismatched to a monolithic 2/8 fleet, where the cure is the shape mix and bin-packing, not paid RAM.

Warm floors

Kept as slot counts (30 prod, 2 canary/staging). A floor absorbs N concurrent cold starts regardless of shape size, so the same count now holds 2/8 runners instead of 4/16: half the RAM per warm slot, and roughly a host’s worth of warm capacity freed in prod.

Scope

This changes the default going forward: new accounts (create_default_for_account reads Catalog.default()), the warm pool, and the new-profile form preselect. Existing auto-created linux profiles are not reshaped. A protected row at 4/16 is indistinguishable from a customer who deliberately chose 4/16, so shrinking it to 8 GB could OOM their jobs. The backfill migration from #10970 is left untouched, and no new migration is added.


Validation

  • grep confirms zero tuist-production-linux references under .github/workflows/ and server/.
  • Elixir files parse (Code.string_to_quoted!); the three managed values files parse as YAML and each has exactly one default: true, on the 2/8 shape, with warm floors preserved (30 / 2 / 2).
  • No test reads the real catalog default (the parse_shapes_json cases use their own JSON input; profiles_test stubs the catalog), so the change needs no test updates. Full mix test not run locally: fresh worktree without deps/, _build/, or the 1Password dev.key; CI covers the suite.

🤖 Generated with Claude Code

Comments
F
fortmarek Jun 3, 2026

Merging now, so we unblock jobs being queued (since we changed our default pool configuration that now hits the pool that doesn’t have a ton of space to operate)

TA
tuist-atlas[bot] Jun 4, 2026

The runner label migration to tuist-linux and the new default profile shape (2 vCPU / 8 GB) are now available in server@1.205.0. Update to this version to adopt the new defaults.

TA
tuist-atlas[bot] Jun 5, 2026

The changes from this PR are now available in release xcresult-processor-image@0.11.0. Runners have moved to the tuist-linux label and the default profile is now 2 vCPU / 8 GB.