Hive Hive
Sign in

feat(server): split database runtime and migration roles

GitHub issue · Closed

Metadata
Source
tuist/tuist #11408
Updated
Jun 24, 2026
Details

Summary

This PR introduces the first step of the database hardening plan: splitting the database role used for schema ownership and migrations from the role used by the running Tuist web application.

Today, managed CNPG deployments expose the CNPG-generated <cluster>-app Secret to both the migration Job and server pods. That Secret belongs to the postgresql.cnpg.owner role (tuist_app by default), which is the role created during bootstrap.initdb and the owner of objects produced by Ecto migrations. In practice, the web runtime therefore runs with the same privileges needed to create and evolve schema objects.

That works operationally, but it is broader than the runtime needs. It also makes it difficult to support customers with stricter database controls, because there was no first-class way to say:

  • migrations should run as the schema owner
  • the web application should run as a narrower DML role
  • migrations should keep that runtime role up to date as tables, sequences, and functions change

This PR adds that split while keeping the rollout safe for existing managed deployments.

Motivation

This came from the investigation into whether Tuist can avoid requiring broad access to PostgreSQL’s default public schema for customers with stricter internal policies.

The important distinction is:

  • using the public schema is not automatically a security smell
  • using an application runtime role with schema ownership / DDL privileges is a much stronger security smell

Custom schema support is still a separate concern and is not implemented here. Before adding custom-schema support, we should make the role model safer and more explicit. Otherwise a custom schema would still be accessed by an overly powerful runtime role, just in a different namespace.

This PR therefore focuses on the foundational role boundary:

  • tuist_app: owner / migration role
  • tuist_web: web runtime role
  • tuist_processor: existing least-privilege processor role
  • tuist_ops_ro: existing read-only operator role

That gives us a safer database posture now and a cleaner foundation for custom schema support later.

Root Cause / Previous Behavior

This is not a bugfix for a production incident, but it does fix a structural limitation in our deployment model.

Before this PR:

  • Tuist.Release.migrate/0 loaded the application and ran migrations using the configured DATABASE_URL.
  • In managed CNPG mode, both the migration Job and server Deployment read DATABASE_URL from the CNPG owner Secret (<cluster>-app).
  • There was no TUIST_MIGRATION_DATABASE_URL override for release migrations.
  • There was no post-migration grant step to keep a runtime role aligned with newly-created tables, sequences, or functions.
  • The managed chart already had narrower roles for processor and Ops read-only access, but not for the web runtime.

The observed limitation was that a deployment could not safely set DATABASE_URL to a narrow web role and still run migrations, because migrations need ownership/DDL privileges and future objects would not automatically be granted back to the runtime role.

What Changed

Release Migration Role Support

Tuist.Release.migrate/0 now supports TUIST_MIGRATION_DATABASE_URL.

When set, the release task:

  • forces the migration pool size to 1, as before
  • uses TUIST_MIGRATION_DATABASE_URL as the effective migration URL
  • updates the loaded Tuist.Repo runtime config so release migrations really use the migration URL even when the release was booted with a narrower DATABASE_URL
  • leaves other repos untouched, including Tuist.IngestRepo

This matters because release config is evaluated before the migration function body in some boot paths. Merely setting DATABASE_URL inside the function is not enough; the already-loaded Repo config must also be updated.

Runtime Role Grants

Tuist.Release.migrate/0 now optionally reads TUIST_DATABASE_RUNTIME_ROLE.

When set, after PostgreSQL migrations complete, it grants the runtime role:

  • CONNECT on the application database
  • USAGE on public
  • no CREATE on public
  • SELECT, INSERT, UPDATE, DELETE on all existing tables in public
  • USAGE, SELECT on all existing sequences in public
  • EXECUTE on all existing functions in public
  • read-only access to public.schema_migrations
  • default privileges for future tables, sequences, and functions

It also revokes CREATE from PUBLIC on the database/schema path as defense in depth.

The runtime role name is validated as an unquoted PostgreSQL identifier before being interpolated, and identifiers are quoted before SQL execution.

Runtime Database URL Parsing

The chart composes database URLs with URL-encoded passwords. Tuist.Environment now centralizes database URL parsing so runtime boot and release migrations decode credentials the same way before passing them to Postgrex.

This is important for generated passwords containing @, :, /, ?, or similar URL-significant characters. The processor path already encoded passwords in URLs; the new web runtime URL follows the same pattern.

CNPG Web Runtime Role

The Helm chart now declares a CNPG-managed web runtime role:

postgresql:
cnpg:
roles:
web:
name: tuist_web

CNPG creates and reconciles that role the same way it already handles tuist_processor and tuist_ops_ro.

The chart also adds a WEB_DATABASE_PASSWORD ExternalSecret, producing the managed role Secret:

  • username
  • password
  • uri

The uri points at the CNPG -rw Service and uses the tuist_web role.

Managed Server / Migration Wiring

In managed CNPG mode:

  • migration jobs continue to use the CNPG owner Secret (<cluster>-app)
  • server pods can switch to the tuist_web Secret
  • TUIST_USE_SSL_FOR_DATABASE=0 remains unchanged for in-cluster CNPG connections
  • self-hosted/unmanaged CNPG installs keep the previous owner-Secret behavior by default

The migration Job also exports TUIST_MIGRATION_DATABASE_URL from the owner Secret, so the release task has an explicit migration URL even if the runtime URL is different.

Rollout Strategy

The rollout sequence is the trickiest part of this change because our managed migration Job runs as a Helm pre-upgrade hook.

Helm pre-upgrade hooks run before normal resources are applied. That means the first deployment containing this chart change cannot assume that CNPG has already reconciled the new tuist_web managed role and Secret.

Preflight: Create the 1Password Items

Before deploying this PR to a managed CNPG environment, create a WEB_DATABASE_PASSWORD item in that environment’s 1Password vault:

  • tuist-k8s-staging
  • tuist-k8s-canary
  • tuist-k8s-production

The item needs a password field with a strong generated password. The chart references it through:

postgresql:
cnpg:
externalSecrets:
webPassword:
item: WEB_DATABASE_PASSWORD
field: password

The onepassword ClusterSecretStore is scoped per environment to the matching tuist-k8s-<env> vault, so the item name can be the same in each vault. Preview currently uses embedded Postgres, so it does not need this item for this rollout.

Status: the required WEB_DATABASE_PASSWORD 1Password items have been created in the staging, canary, and production environment vaults with generated 64-character passwords. The generated password values were not printed or copied into the PR.

If this item is missing, the ExternalSecret for pg-tuist-web will not become Ready, CNPG will not be able to reconcile the tuist_web password Secret, and the rollout gate should remain closed.

To avoid a first-upgrade failure, the chart includes a rollout gate:

postgresql:
cnpg:
roles:
web:
useForServer: auto

auto means:

  1. First deploy:

    • render the CNPG-managed tuist_web role and ExternalSecret
    • keep server pods on the owner Secret
    • do not ask the migration Job to grant tuist_web yet
  2. CNPG/ESO reconcile:

    • the tuist_web role exists
    • the pg-tuist-web Secret exists
  3. Next deploy:

    • the chart detects the existing web-role Secret
    • the migration Job sets TUIST_DATABASE_RUNTIME_ROLE=tuist_web
    • migrations apply grants to tuist_web
    • server pods switch to the tuist_web runtime URL

Operators can also set postgresql.cnpg.roles.web.useForServer=true to force the switch after pre-provisioning the role/Secret, or false to keep using the owner Secret.

Security Model After This PR

After the rollout gate opens, managed CNPG has this intended split:

Role Purpose Privilege Shape
tuist_app Migrations / schema ownership Owns Ecto-created objects and runs schema changes
tuist_web Web runtime DML on app tables, sequence usage, function execution, read-only schema_migrations, no schema creation
tuist_processor Dedicated Oban processor Existing narrow grants for oban_jobs, oban_peers, and required read-only lookup tables
tuist_ops_ro Operator inspection Read-only operator role via pg_read_all_data plus explicit write revokes

This reduces the blast radius of a web-runtime credential compromise. It does not claim to solve every database isolation concern:

  • it does not add custom schema support yet
  • it does not remove the public schema from existing deployments
  • it does not make the runtime role read-only, because the app legitimately writes application data
  • it does not change ClickHouse / Tuist.IngestRepo behavior

Customer Impact

For Tuist-managed environments, this is a safe phased hardening path. Once rolled out, the web application no longer needs to connect as the schema owner.

For self-hosted operators, this adds the primitives needed to run a split-role setup:

  • set DATABASE_URL to the runtime role URL
  • set TUIST_MIGRATION_DATABASE_URL to the owner/migration role URL when running release migrations
  • set TUIST_DATABASE_RUNTIME_ROLE to the runtime role name so migrations maintain grants

Existing single-role setups continue to work because the new environment variables are optional.

For Toss specifically, this PR does not yet provide custom schema support. It does make the database story safer and moves us toward the model they likely need: a runtime role scoped to exactly the privileges Tuist needs, followed by custom-schema support in a separate PR.

structure.sql and Custom Schema Follow-Up

The committed PostgreSQL structure dump remains role-neutral in this PR. It does not include CREATE ROLE, OWNER TO, GRANT, REVOKE, or ALTER DEFAULT PRIVILEGES statements, so it should not bake managed role names such as tuist_app or tuist_web into fresh database bootstrap.

With the role split introduced here, fresh database bootstrap still works as:

  1. connect with the owner/migration role
  2. load structure.sql if the database is empty
  3. run pending migrations
  4. apply the runtime grants to TUIST_DATABASE_RUNTIME_ROLE

That means the runtime role grant step covers objects loaded from structure.sql as well as objects created by migrations.

Custom schema support should be handled in a follow-up PR. The current structure.sql is schema-qualified with public., so a future custom-schema implementation should not assume that changing search_path is enough. The safer initial behavior for non-public schemas is likely to skip the current structure dump and run migrations from scratch, unless we later add a proper schema-aware dump/remapping path.

Files Worth Reviewing Closely

  • server/lib/tuist/release.ex

    • migration URL override
    • runtime role grant SQL
    • runtime role identifier validation
  • server/lib/tuist/environment.ex

    • TUIST_MIGRATION_DATABASE_URL and TUIST_DATABASE_RUNTIME_ROLE accessors
    • shared database URL parser used by runtime config and release migrations
  • server/config/runtime.exs

    • consumes the shared parser before passing DB credentials to Postgrex
  • infra/helm/tuist/templates/_helpers.tpl

    • tuist.cnpgUseWebRole rollout gate
  • infra/helm/tuist/templates/postgresql-cnpg.yaml

    • CNPG-managed tuist_web role
    • WEB_DATABASE_PASSWORD ExternalSecret
  • infra/helm/tuist/templates/server-deployment.yaml

    • gated switch from owner Secret to web runtime Secret
  • infra/helm/tuist/templates/server-migration-job.yaml

    • migration job remains on owner Secret
    • runtime grants only enabled when the gate is open

Validation

Ran:

helm lint infra/helm/tuist

Rendered the focused CNPG path with the default rollout gate:

helm template tuist infra/helm/tuist \
--show-only templates/server-deployment.yaml \
--show-only templates/server-migration-job.yaml \
--show-only templates/postgresql-cnpg.yaml \
--set postgresql.mode=cnpg \
--set postgresql.cnpg.enabled=true \
--set server.managedSecrets=true \
--set server.masterKey=x \
--set processor.enabled=true \
--set processor.managedSecrets=true

Rendered the focused CNPG path with the web role forced on:

helm template tuist infra/helm/tuist \
--show-only templates/server-deployment.yaml \
--show-only templates/server-migration-job.yaml \
--show-only templates/postgresql-cnpg.yaml \
--set postgresql.mode=cnpg \
--set postgresql.cnpg.enabled=true \
--set server.managedSecrets=true \
--set server.masterKey=x \
--set processor.enabled=true \
--set processor.managedSecrets=true \
--set postgresql.cnpg.roles.web.useForServer=true

Rendered managed staging DB/server manifests with unrelated CI-only image guards disabled locally:

helm template tuist infra/helm/tuist \
-f infra/helm/tuist/values-managed-common.yaml \
-f infra/helm/tuist/values-managed-staging.yaml \
--show-only templates/server-deployment.yaml \
--show-only templates/server-migration-job.yaml \
--show-only templates/postgresql-cnpg.yaml \
--set runnersFleet.enabled=false \
--set runnersFleetLinux.enabled=false \
--set kuraController.enabled=false \
--set postgresql.cnpg.roles.web.useForServer=true

Parsed the changed Elixir files with the pinned Elixir runtime:

elixir -e 'Code.string_to_quoted!(File.read!("server/lib/tuist/environment.ex")); Code.string_to_quoted!(File.read!("server/lib/tuist/release.ex")); Code.string_to_quoted!(File.read!("server/config/runtime.exs"))'

Checked the shared database URL parser decodes URL-encoded credentials:

elixir -e 'Application.ensure_all_started(:mix); Code.compile_file("server/lib/tuist/environment.ex"); config = Tuist.Environment.database_config_from_url("ecto://tuist_web:p%40ss%3Aword@postgres.example/tuist?ssl=true"); unless Keyword.fetch!(config, :username) == "tuist_web" and Keyword.fetch!(config, :password) == "p@ss:word" and Keyword.fetch!(config, :hostname) == "postgres.example" and Keyword.fetch!(config, :database) == "tuist", do: raise("database_config_from_url check failed")'

Checked whitespace:

git diff --check

mix format could not run in this fresh worktree because server/deps is absent and the formatter imports ecto_sql.

Follow-Up Work

  • Complete the post-cutover cleanup once all managed environments have soaked on tuist_web.
  • Add custom schema support as a separate PR, including the explicit decision for how non-public schemas should handle the currently public.-qualified structure.sql.
  • Consider an explicit integration test around the migration/runtime URL split once test DB setup is available in this worktree.
  • After the first managed deploy creates the tuist_web role/Secret, verify the next deploy switches server pods to the runtime role and applies grants successfully.

Post-Cutover Cleanup

After staging, canary, and production have all run successfully on tuist_web, there should be a cleanup pass to make the hardened role model the steady state rather than a rollout mechanism.

Recommended cleanup:

  1. Make the cutover explicit in managed values.

    • Set postgresql.cnpg.roles.web.useForServer: true for each managed environment after that environment has switched successfully.
    • Stop relying on auto once the rollout is complete, because a foundational credential change should not happen accidentally on an unrelated future deploy.
  2. Confirm steady-state role usage.

    • Server pods should use the pg-tuist-web Secret.
    • Migration jobs should continue using the CNPG owner Secret (<cluster>-app).
    • Processor pods should continue using the processor role.
    • pg_stat_activity should show tuist_app only for migrations/operator activity, not steady-state web traffic.
  3. Rotate the owner credential.

    • The owner role was historically exposed to steady-state server pods.
    • Once web pods no longer use it, rotate the CNPG owner credential so any old runtime exposure is retired.
    • Keep the owner role and Secret; migrations still need them.
  4. Update runbooks to describe the final steady state.

    • Replace rollout-gate language with the expected final role model.
    • Document how to verify server/migration/processor role usage during deploys.
  5. Remove temporary rollout scaffolding in a later cleanup PR.

    • Once all managed environments have soaked across several deploys, simplify the chart so managed CNPG server pods always use the web runtime role.
    • Keep explicit escape hatches only if we decide they are operationally useful long term.
Comments

No GitHub comments yet.