Hive
feat(server): gate runner availability solely on the :runners feature flag
GitHub issue · Closed
Describe here the purpose of your PR.
Remove the per-account runner_max_concurrent concurrency cap. Whether managed GitHub Actions runners are available for an account is now determined solely by the :runners FunWithFlags feature flag (Tuist.FeatureFlags.runners_enabled?/1) — always-on outside prod, per-account toggle in prod. There is no concurrency limit anymore.
Why
The cap column conflated two concerns into one integer: is this customer allowed to use runners at all and how many can run concurrently. We don’t want a concurrency limit for now, and availability is already modelled by the :runners feature flag everywhere else (the UI/LiveView guards all use runners_enabled?/1). This collapses gating to a single source of truth.
What changed
- Dispatch (
Dispatch.handle_webhook) rejects with:runners_disabledwhenrunners_enabled?/1is false, instead of when the cap was 0. The account-by-handle lookup cache now caches every resolved account — enablement is evaluated per webhook against the flag (which has its own cache + invalidation), so a flag flip still takes effect immediately without waiting out the cache TTL. Claims.attempt/4no longer fetches or checks the cap. It keeps the advisory lock (now purely for pod-in-use atomicity), the pod-in-use check, and theINSERT … ON CONFLICTrace collapse. Dropped the:over_cap,:runners_disabled, and:unknown_accountclaim outcomes.- Dispatch pre-filter — removed
ineligible_accounts/at_cap?fromTuist.Runners;pick_queuedis always called with an empty exclusion set. - Schema/DB — dropped the
accounts.runner_max_concurrentcolumn (migration,remove_if_exists+ safety-assured, reversibledown) and the Ecto schema field.
Claims.counts_per_account/0 and pick_queued‘s ineligible-account param are deliberately kept as generic, tested primitives (callers now pass []) rather than churning ~40 unrelated test call sites; they’re the natural building blocks if a cap ever returns.
Also updated the tests (claims/dispatch/prom/controller/profiles-live, plus test_helper.exs now copies Tuist.FeatureFlags for stubbing), dev seeds, server/data-export.md, and the stale infra comments that referenced the removed column (helm values + server deployment, two runners-controller Go docs, the Grafana panel description).
Reviewer notes
- Enablement criterion shifts: prod dispatch previously worked when
runner_max_concurrent > 0; it now requires the:runnersflag. The flag must be set for any account that should have runners in prod. - Dropping the column directly (rather than expand/contract) is safe here because the runners feature is only enabled for the Tuist org — there’s no other live consumer of the column to break during the rolling-deploy window.
How to test locally
cd server && mix test test/tuist/runners/ test/tuist_web/controllers/runners_controller_test.exs test/tuist_web/live/runner_profiles_live_test.exs— all green.mix excellent_migrations.check_safety— the new migration is clean;mix ecto.migrate/ecto.rollbackapply and reverse cleanly.- With the
:runnersflag enabled for an account (always-on in dev), aworkflow_job: queuedwebhook enqueues and a polling Pod claims it with no concurrency ceiling. With the flag off (prod only), dispatch returns:runners_disabled.