Hive
feat(server): split database runtime and migration roles
GitHub issue · Closed
Summary
This PR introduces the first step of the database hardening plan: splitting the database role used for schema ownership and migrations from the role used by the running Tuist web application.
Today, managed CNPG deployments expose the CNPG-generated <cluster>-app Secret to both the migration Job and server pods. That Secret belongs to the postgresql.cnpg.owner role (tuist_app by default), which is the role created during bootstrap.initdb and the owner of objects produced by Ecto migrations. In practice, the web runtime therefore runs with the same privileges needed to create and evolve schema objects.
That works operationally, but it is broader than the runtime needs. It also makes it difficult to support customers with stricter database controls, because there was no first-class way to say:
- migrations should run as the schema owner
- the web application should run as a narrower DML role
- migrations should keep that runtime role up to date as tables, sequences, and functions change
This PR adds that split while keeping the rollout safe for existing managed deployments.
Motivation
This came from the investigation into whether Tuist can avoid requiring broad access to PostgreSQL’s default public schema for customers with stricter internal policies.
The important distinction is:
- using the
publicschema is not automatically a security smell - using an application runtime role with schema ownership / DDL privileges is a much stronger security smell
Custom schema support is still a separate concern and is not implemented here. Before adding custom-schema support, we should make the role model safer and more explicit. Otherwise a custom schema would still be accessed by an overly powerful runtime role, just in a different namespace.
This PR therefore focuses on the foundational role boundary:
tuist_app: owner / migration roletuist_web: web runtime roletuist_processor: existing least-privilege processor roletuist_ops_ro: existing read-only operator role
That gives us a safer database posture now and a cleaner foundation for custom schema support later.
Root Cause / Previous Behavior
This is not a bugfix for a production incident, but it does fix a structural limitation in our deployment model.
Before this PR:
Tuist.Release.migrate/0loaded the application and ran migrations using the configuredDATABASE_URL.- In managed CNPG mode, both the migration Job and server Deployment read
DATABASE_URLfrom the CNPG owner Secret (<cluster>-app). - There was no
TUIST_MIGRATION_DATABASE_URLoverride for release migrations. - There was no post-migration grant step to keep a runtime role aligned with newly-created tables, sequences, or functions.
- The managed chart already had narrower roles for processor and Ops read-only access, but not for the web runtime.
The observed limitation was that a deployment could not safely set DATABASE_URL to a narrow web role and still run migrations, because migrations need ownership/DDL privileges and future objects would not automatically be granted back to the runtime role.
What Changed
Release Migration Role Support
Tuist.Release.migrate/0 now supports TUIST_MIGRATION_DATABASE_URL.
When set, the release task:
- forces the migration pool size to 1, as before
- uses
TUIST_MIGRATION_DATABASE_URLas the effective migration URL - updates the loaded
Tuist.Reporuntime config so release migrations really use the migration URL even when the release was booted with a narrowerDATABASE_URL - leaves other repos untouched, including
Tuist.IngestRepo
This matters because release config is evaluated before the migration function body in some boot paths. Merely setting DATABASE_URL inside the function is not enough; the already-loaded Repo config must also be updated.
Runtime Role Grants
Tuist.Release.migrate/0 now optionally reads TUIST_DATABASE_RUNTIME_ROLE.
When set, after PostgreSQL migrations complete, it grants the runtime role:
CONNECTon the application databaseUSAGEonpublic- no
CREATEonpublic SELECT,INSERT,UPDATE,DELETEon all existing tables inpublicUSAGE,SELECTon all existing sequences inpublicEXECUTEon all existing functions inpublic- read-only access to
public.schema_migrations - default privileges for future tables, sequences, and functions
It also revokes CREATE from PUBLIC on the database/schema path as defense in depth.
The runtime role name is validated as an unquoted PostgreSQL identifier before being interpolated, and identifiers are quoted before SQL execution.
Runtime Database URL Parsing
The chart composes database URLs with URL-encoded passwords. Tuist.Environment now centralizes database URL parsing so runtime boot and release migrations decode credentials the same way before passing them to Postgrex.
This is important for generated passwords containing @, :, /, ?, or similar URL-significant characters. The processor path already encoded passwords in URLs; the new web runtime URL follows the same pattern.
CNPG Web Runtime Role
The Helm chart now declares a CNPG-managed web runtime role:
postgresql:
cnpg:
roles:
web:
name: tuist_web
CNPG creates and reconciles that role the same way it already handles tuist_processor and tuist_ops_ro.
The chart also adds a WEB_DATABASE_PASSWORD ExternalSecret, producing the managed role Secret:
usernamepassworduri
The uri points at the CNPG -rw Service and uses the tuist_web role.
Managed Server / Migration Wiring
In managed CNPG mode:
- migration jobs continue to use the CNPG owner Secret (
<cluster>-app) - server pods can switch to the
tuist_webSecret TUIST_USE_SSL_FOR_DATABASE=0remains unchanged for in-cluster CNPG connections- self-hosted/unmanaged CNPG installs keep the previous owner-Secret behavior by default
The migration Job also exports TUIST_MIGRATION_DATABASE_URL from the owner Secret, so the release task has an explicit migration URL even if the runtime URL is different.
Rollout Strategy
The rollout sequence is the trickiest part of this change because our managed migration Job runs as a Helm pre-upgrade hook.
Helm pre-upgrade hooks run before normal resources are applied. That means the first deployment containing this chart change cannot assume that CNPG has already reconciled the new tuist_web managed role and Secret.
Preflight: Create the 1Password Items
Before deploying this PR to a managed CNPG environment, create a WEB_DATABASE_PASSWORD item in that environment’s 1Password vault:
tuist-k8s-stagingtuist-k8s-canarytuist-k8s-production
The item needs a password field with a strong generated password. The chart references it through:
postgresql:
cnpg:
externalSecrets:
webPassword:
item: WEB_DATABASE_PASSWORD
field: password
The onepassword ClusterSecretStore is scoped per environment to the matching tuist-k8s-<env> vault, so the item name can be the same in each vault. Preview currently uses embedded Postgres, so it does not need this item for this rollout.
Status: the required WEB_DATABASE_PASSWORD 1Password items have been created in the staging, canary, and production environment vaults with generated 64-character passwords. The generated password values were not printed or copied into the PR.
If this item is missing, the ExternalSecret for pg-tuist-web will not become Ready, CNPG will not be able to reconcile the tuist_web password Secret, and the rollout gate should remain closed.
To avoid a first-upgrade failure, the chart includes a rollout gate:
postgresql:
cnpg:
roles:
web:
useForServer: auto
auto means:
-
First deploy:
- render the CNPG-managed
tuist_webrole and ExternalSecret - keep server pods on the owner Secret
- do not ask the migration Job to grant
tuist_webyet
- render the CNPG-managed
-
CNPG/ESO reconcile:
- the
tuist_webrole exists - the
pg-tuist-webSecret exists
- the
-
Next deploy:
- the chart detects the existing web-role Secret
- the migration Job sets
TUIST_DATABASE_RUNTIME_ROLE=tuist_web - migrations apply grants to
tuist_web - server pods switch to the
tuist_webruntime URL
Operators can also set postgresql.cnpg.roles.web.useForServer=true to force the switch after pre-provisioning the role/Secret, or false to keep using the owner Secret.
Security Model After This PR
After the rollout gate opens, managed CNPG has this intended split:
| Role | Purpose | Privilege Shape |
|---|---|---|
tuist_app |
Migrations / schema ownership | Owns Ecto-created objects and runs schema changes |
tuist_web |
Web runtime | DML on app tables, sequence usage, function execution, read-only schema_migrations, no schema creation |
tuist_processor |
Dedicated Oban processor | Existing narrow grants for oban_jobs, oban_peers, and required read-only lookup tables |
tuist_ops_ro |
Operator inspection | Read-only operator role via pg_read_all_data plus explicit write revokes |
This reduces the blast radius of a web-runtime credential compromise. It does not claim to solve every database isolation concern:
- it does not add custom schema support yet
- it does not remove the
publicschema from existing deployments - it does not make the runtime role read-only, because the app legitimately writes application data
- it does not change ClickHouse /
Tuist.IngestRepobehavior
Customer Impact
For Tuist-managed environments, this is a safe phased hardening path. Once rolled out, the web application no longer needs to connect as the schema owner.
For self-hosted operators, this adds the primitives needed to run a split-role setup:
- set
DATABASE_URLto the runtime role URL - set
TUIST_MIGRATION_DATABASE_URLto the owner/migration role URL when running release migrations - set
TUIST_DATABASE_RUNTIME_ROLEto the runtime role name so migrations maintain grants
Existing single-role setups continue to work because the new environment variables are optional.
For Toss specifically, this PR does not yet provide custom schema support. It does make the database story safer and moves us toward the model they likely need: a runtime role scoped to exactly the privileges Tuist needs, followed by custom-schema support in a separate PR.
structure.sql and Custom Schema Follow-Up
The committed PostgreSQL structure dump remains role-neutral in this PR. It does not include CREATE ROLE, OWNER TO, GRANT, REVOKE, or ALTER DEFAULT PRIVILEGES statements, so it should not bake managed role names such as tuist_app or tuist_web into fresh database bootstrap.
With the role split introduced here, fresh database bootstrap still works as:
- connect with the owner/migration role
- load
structure.sqlif the database is empty - run pending migrations
- apply the runtime grants to
TUIST_DATABASE_RUNTIME_ROLE
That means the runtime role grant step covers objects loaded from structure.sql as well as objects created by migrations.
Custom schema support should be handled in a follow-up PR. The current structure.sql is schema-qualified with public., so a future custom-schema implementation should not assume that changing search_path is enough. The safer initial behavior for non-public schemas is likely to skip the current structure dump and run migrations from scratch, unless we later add a proper schema-aware dump/remapping path.
Files Worth Reviewing Closely
-
server/lib/tuist/release.ex- migration URL override
- runtime role grant SQL
- runtime role identifier validation
-
server/lib/tuist/environment.exTUIST_MIGRATION_DATABASE_URLandTUIST_DATABASE_RUNTIME_ROLEaccessors- shared database URL parser used by runtime config and release migrations
-
server/config/runtime.exs- consumes the shared parser before passing DB credentials to Postgrex
-
infra/helm/tuist/templates/_helpers.tpltuist.cnpgUseWebRolerollout gate
-
infra/helm/tuist/templates/postgresql-cnpg.yaml- CNPG-managed
tuist_webrole WEB_DATABASE_PASSWORDExternalSecret
- CNPG-managed
-
infra/helm/tuist/templates/server-deployment.yaml- gated switch from owner Secret to web runtime Secret
-
infra/helm/tuist/templates/server-migration-job.yaml- migration job remains on owner Secret
- runtime grants only enabled when the gate is open
Validation
Ran:
helm lint infra/helm/tuist
Rendered the focused CNPG path with the default rollout gate:
helm template tuist infra/helm/tuist \
--show-only templates/server-deployment.yaml \
--show-only templates/server-migration-job.yaml \
--show-only templates/postgresql-cnpg.yaml \
--set postgresql.mode=cnpg \
--set postgresql.cnpg.enabled=true \
--set server.managedSecrets=true \
--set server.masterKey=x \
--set processor.enabled=true \
--set processor.managedSecrets=true
Rendered the focused CNPG path with the web role forced on:
helm template tuist infra/helm/tuist \
--show-only templates/server-deployment.yaml \
--show-only templates/server-migration-job.yaml \
--show-only templates/postgresql-cnpg.yaml \
--set postgresql.mode=cnpg \
--set postgresql.cnpg.enabled=true \
--set server.managedSecrets=true \
--set server.masterKey=x \
--set processor.enabled=true \
--set processor.managedSecrets=true \
--set postgresql.cnpg.roles.web.useForServer=true
Rendered managed staging DB/server manifests with unrelated CI-only image guards disabled locally:
helm template tuist infra/helm/tuist \
-f infra/helm/tuist/values-managed-common.yaml \
-f infra/helm/tuist/values-managed-staging.yaml \
--show-only templates/server-deployment.yaml \
--show-only templates/server-migration-job.yaml \
--show-only templates/postgresql-cnpg.yaml \
--set runnersFleet.enabled=false \
--set runnersFleetLinux.enabled=false \
--set kuraController.enabled=false \
--set postgresql.cnpg.roles.web.useForServer=true
Parsed the changed Elixir files with the pinned Elixir runtime:
elixir -e 'Code.string_to_quoted!(File.read!("server/lib/tuist/environment.ex")); Code.string_to_quoted!(File.read!("server/lib/tuist/release.ex")); Code.string_to_quoted!(File.read!("server/config/runtime.exs"))'
Checked the shared database URL parser decodes URL-encoded credentials:
elixir -e 'Application.ensure_all_started(:mix); Code.compile_file("server/lib/tuist/environment.ex"); config = Tuist.Environment.database_config_from_url("ecto://tuist_web:p%40ss%3Aword@postgres.example/tuist?ssl=true"); unless Keyword.fetch!(config, :username) == "tuist_web" and Keyword.fetch!(config, :password) == "p@ss:word" and Keyword.fetch!(config, :hostname) == "postgres.example" and Keyword.fetch!(config, :database) == "tuist", do: raise("database_config_from_url check failed")'
Checked whitespace:
git diff --check
mix format could not run in this fresh worktree because server/deps is absent and the formatter imports ecto_sql.
Follow-Up Work
- Complete the post-cutover cleanup once all managed environments have soaked on
tuist_web. - Add custom schema support as a separate PR, including the explicit decision for how non-
publicschemas should handle the currentlypublic.-qualifiedstructure.sql. - Consider an explicit integration test around the migration/runtime URL split once test DB setup is available in this worktree.
- After the first managed deploy creates the
tuist_webrole/Secret, verify the next deploy switches server pods to the runtime role and applies grants successfully.
Post-Cutover Cleanup
After staging, canary, and production have all run successfully on tuist_web, there should be a cleanup pass to make the hardened role model the steady state rather than a rollout mechanism.
Recommended cleanup:
-
Make the cutover explicit in managed values.
- Set
postgresql.cnpg.roles.web.useForServer: truefor each managed environment after that environment has switched successfully. - Stop relying on
autoonce the rollout is complete, because a foundational credential change should not happen accidentally on an unrelated future deploy.
- Set
-
Confirm steady-state role usage.
- Server pods should use the
pg-tuist-webSecret. - Migration jobs should continue using the CNPG owner Secret (
<cluster>-app). - Processor pods should continue using the processor role.
pg_stat_activityshould showtuist_apponly for migrations/operator activity, not steady-state web traffic.
- Server pods should use the
-
Rotate the owner credential.
- The owner role was historically exposed to steady-state server pods.
- Once web pods no longer use it, rotate the CNPG owner credential so any old runtime exposure is retired.
- Keep the owner role and Secret; migrations still need them.
-
Update runbooks to describe the final steady state.
- Replace rollout-gate language with the expected final role model.
- Document how to verify server/migration/processor role usage during deploys.
-
Remove temporary rollout scaffolding in a later cleanup PR.
- Once all managed environments have soaked across several deploys, simplify the chart so managed CNPG server pods always use the web runtime role.
- Keep explicit escape hatches only if we decide they are operationally useful long term.
No GitHub comments yet.