feat(server): tag-resolved deploys, stop committing to main

GitHub issue · Closed

Open on GitHub

Metadata

Source

tuist/tuist #11160

Updated

Jun 24, 2026

Domains

Compute Kura

Details

Summary

Redesigns the release pipeline so it never commits to main and so a fleet/runtime image (runner-image, linux-runner-image, controllers, Kura runtime, xcresult-processor) is built and deployed with the server in one pass — replacing the old “deploy server → release commits an image pin → server redeploys” double cascade.

Opened as a draft: this touches the most critical CI in the repo and can only be fully exercised once on main, so it wants careful review first.

Root cause (one diagnosis, both problems)

The deployed version of every image except the server itself lived as a committed pin in the managed Helm values (xcresultProcessor, capi/macosFleet, runnersController, kuraRuntime in values-managed-common.yaml; runnerImageSemver + runnersFleetLinux.shapeRunnerImage in the env overlays; the hetzner controller in the mgmt manifest). release.yml sed-rewrote those pins after building the images and pushed the bump to main. Because server-production-deployment.yml watched infra/helm/tuist/**, that bump commit re-triggered the whole canary → acceptance → production cascade.

So the committed pin was simultaneously (a) the commit to main and (b) the cause of the second deploy. The server image already avoided this — server.image.tag: "" resolved at deploy via --set from the commit SHA. This PR extends that model to every image.

While implementing, the committed pins turned out to have drifted stale from the released tags (runnersController pinned 0.11.0 but the latest tag is 0.12.0; kuraRuntime 0.7.1 vs 0.7.4; xcresultProcessor 0.12.3 vs 0.12.4; runnerImageSemver 0.3.1 vs 0.3.2). Resolving from the tag removes that drift class entirely. Consequence to expect: the first deploy after merge rolls these forward to the latest already-released versions.

What changed

Phase 1 — deploy-time tag resolution

Managed values pins set to "". server-deployment.yml resolves each version from the latest <component>@<semver> git tag (--merged the deployed SHA, so re-promoting an old commit resolves versions as they were) and passes them via --set.
release.yml tags every fleet/runtime image it builds (tag-infra-releases).
The [Release] commit-and-push step and all sed-into-values pin-backs are removed.

Phase 2 — Sparkle appcast off `main`

SUFeedURL repointed from raw.githubusercontent.com/.../main/app/appcast.xml to a stable GitHub Release asset (releases/download/appcast/appcast.xml); the app release publishes the signed feed there (make_latest: false). The feed is a published artifact, not committed source. (Has a one-time post-merge migration — see below.)

Phase 3 — versions stamped, not committed

Each release job stamps its version into the build artifact before building (Constants.swift, Project.swift, Cargo.toml, mix.exs, Chart.yaml), so nothing needs committing back. The dead mise.toml/mise.lock mutation steps are removed.

Phase 4 — Renovate owns the in-source consumer pins

renovate.json: the tuist mise pin runs at any time (overriding a new weekly default for everything else) and auto-merges; the hetzner mgmt-manifest image and the dev.tuist Gradle plugin consumer pins (CLI Constants.swift, android settings.gradle.kts) are Renovate-tracked. These land via auto-merged PRs, not CI pushes — so they survive branch protection.

Deploy orchestration (how the pieces fit)

server-production-deployment.yml is the deploy. It triggers on push (server-relevant paths) and workflow_dispatch, both routed through a leading gate job: gate runs release:check and, on push, defers (skips the cascade) when a fleet/runtime image is releasing this push — otherwise it deploys now. It also resolves the deploy SHA and detects hotfix (commit subject on push, input on dispatch). Its own concurrency group keeps it off release.yml’s build queue and serializes cascades.
release.yml owns server/cache/kura/noora/helm/skills/gradle + the infra images, and dispatches server-production-deployment.yml (trigger-deploy) only when a fleet/runtime image released this run — that deploy must wait for the image build+tag, and gate defers to it. So no push deploys twice or is skipped.
cli-release.yml / app-release.yml — CLI and app build+publish in their own workflows with their own concurrency lanes, so a fleet-image deploy never queues behind a ~50-min CLI/app build. Each self-serializes its own component’s releases for version correctness.
notify-deploy-success.yml notifies Slack only when the run’s production deploy job (… / Deploy to tuist) actually succeeded, so a gate deferral (a green run that didn’t deploy) doesn’t false-fire.

Why dispatch, not a needs: edge: a deploy inside release.yml would either let cancel-in-progress SIGKILL an in-flight helm --atomic deploy, or (cancel off) serialize all builds behind every deploy. A separate dispatched run is immune to release-pipeline cancellation and preserves server-deployment.yml’s per-env serialization. release.yml itself is queue-not-cancel (serialize for version correctness, but never tear an in-flight release/deploy).

Post-merge steps (cannot be done in this PR)

1. Sparkle appcast cutover

The new-feed code is in this PR, but the migration only executes at the first app release after merge — installed apps baked the old SUFeedURL and poll raw.githubusercontent.com/.../main/app/appcast.xml until they update to a build carrying the new URL. In order:

First app release after merge — app-release.yml builds the app (new SUFeedURL baked in) and publishes the signed feed to the appcast GitHub Release. The appcast release + feed don’t exist until this runs (the feed’s EdDSA signature is produced by that build).
Bridge the old feed — mandatory, one time. That same cutover build must also be advertised in the old committed app/appcast.xml, or existing installs never see a new-URL build and strand. Generate + commit it once (same invocation the workflow uses, signed with the Sparkle private key from 1Password):
```
generate_appcast --link https://github.com/tuist/tuist/releases \
  --download-url-prefix https://github.com/tuist/tuist/releases/download/<app@version>/Tuist.dmg \
  -o app/appcast.xml app/build/artifacts --ed-key-file -
```
Freeze — never update app/appcast.xml again. New releases publish only to the appcast release. An old install that checks later still finds the cutover build in the frozen feed → updates to it → follows the new feed thereafter (stepped update, no permanent stranding).
Then protect main (the frozen file just sits there; no further CI commits to it).

2. Protect `main`

After confirming a real run no longer pushes to main, apply a ruleset on main — restrict creations/updates to PRs, block force pushes, require the conventional-pr + relevant checks. Tags bypass branch protection by design; Renovate/human PRs auto-merge through it. Repo setting, not code.

3. Renovate dry-run

renovate.json passes renovate-config-validator (schema). Before relying on auto-merge, confirm via a hosted Renovate dry-run that the mise manager resolves the tuist depName and the Gradle-plugin-portal datasource maps as configured.

Validation

All the touched workflows (release.yml, cli-release.yml, app-release.yml, server-deployment.yml, server-production-deployment.yml, notify-deploy-success.yml) parse as YAML and pass actionlint (only pre-existing self-hosted-label/shellcheck noise).
renovate.json validates with renovate-config-validator.
All seven <component>@<semver> tags exist, so the first deploy resolves real versions.
Not runnable end-to-end pre-merge: release.yml triggers on push to main, and the dispatched deploy uses main’s workflow version. Inherent to release/deploy-workflow changes, and the main reason this is a draft.

🤖 Generated with Claude Code

Comments

fortmarek Jun 8, 2026

Follow-up: decoupled server deploys from CLI/app release serialization

Per review feedback — a server deploy waiting behind an in-flight CLI/app release is wrong (they share no component, tag, or version).

Cause: not the job graph (within a run trigger-deploy already ignores the CLI/app build jobs), but release.yml‘s workflow-level concurrency group: with cancel-in-progress: false, a server push’s entire run sits pending behind the in-flight CLI run, which contains the 50-min builds. That group can’t simply be dropped — version computation requires check-releases to run after the prior run’s tags exist.

Fix (commit 0a876fd): the server/chart-only deploy doesn’t need release.yml at all (it reads already-published infra tags), so it moves off the serialized lane:

server-deploy-dispatch.yml (new) triggers on the server-relevant paths, runs the same mise run release:check, and — when no fleet/runtime image is releasing this push — dispatches the deploy immediately on its own concurrency lane.
release.yml trigger-deploy narrows to fire only when a fleet/runtime image released (that deploy genuinely must wait for the image build + tag). deploy-gate removed.

Mutually exclusive across every case → no double / missed deploy:

Push	Owner
server / chart / config, no fleet release	`server-deploy-dispatch.yml` (immediate)
fleet image releases (± server change)	`release.yml` `trigger-deploy` (after build)
fleet path touched but no release (scope mismatch) + server change	`server-deploy-dispatch.yml` (immediate)
docs-only	neither

server-production-deployment.yml gains a concurrency group so both dispatch sources serialize into one cascade at a time without killing an in-flight helm --atomic deploy.

fortmarek Jun 9, 2026

Follow-up: fully decoupled — CLI/app releases extracted to their own workflows

Earlier I over-claimed “neither waits on the other.” Correcting that: server-deploy-dispatch.yml only decoupled server/chart-only deploys. A fleet-image deploy (runner-image, linux-runner, controllers, kura runtime, xcresult) still ran inside release.yml, so it queued behind any in-flight CLI/app release — because release.yml serializes all its runs under one concurrency group, and those runs contain the ~50-min CLI/app builds.

Root cause: the concurrency group is per-workflow, but the version-correctness it protects is per-component (cli@ and runner-image@ never collide). The group was broader than the invariant.

Fix (commit ea60fa7): CLI and app share no tag/version with anything else, so they move to their own push-triggered workflows with their own concurrency lanes:

cli-release.yml — build (macOS + Linux) + publish (GitHub Release, Homebrew formula).
app-release.yml — build (macOS/iOS/Android) + publish (GitHub Release, Sparkle appcast, Homebrew cask).
release.yml — now only server / cache / kura / noora / helm / skills / gradle / infra images + deploy orchestration.

Each workflow self-serializes its own component’s releases (version correctness preserved). Now a fleet-image release runs in release.yml without the CLI/app builds in the same lane, so its deploy no longer waits behind them.

Net deploy latency:

Deploy trigger	Waits on
server / chart / config	nothing (own `server-deploy-dispatch.yml` lane)
fleet/runtime image	its own build in `release.yml` — no longer CLI/app

CLI/app releases run fully in parallel with the server/infra pipeline and with each other.

tuist[bot] Jun 9, 2026

🛠️ Tuist Run Report 🛠️

Previews 📦

App	Commit	Open on device
Tuist	5ca2bd564

Tests 🧪

Scheme	Status	Cache hit rate	Tests	Skipped	Ran	Commit
TuistApp	✅	33 %	28	0	28	5ca2bd564

Builds 🔨

Scheme	Status	Duration	Commit
TuistApp	✅	3m 19s	5ca2bd564

Bundles 🧰

Bundle	Commit	Install size	Download size
Tuist	5ca2bd564	19.5 MB`Δ -60.6 KB (-0.31%)`	14.7 MB`Δ -33.6 KB (-0.23%)`

tuist-atlas[bot] Jun 10, 2026

This change is now available in runner-image@0.4.0. Update to the runner image tags ghcr.io/tuist/tuist-runner:macos-26-5-0.4.0 or ghcr.io/tuist/tuist-runner:macos-26-4-1-0.4.0 to use it.