Hive
fix(server): grant runtime role EXECUTE only on migration-owned functions
GitHub issue · Closed
What changed
The post-migration runtime-role grant in Tuist.Release no longer uses the blanket GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public. It now iterates the functions the migration role owns and grants EXECUTE on each individually.
Why (root cause)
The database role split (#11408) added a step that, after migrations run, grants the narrow runtime role (tuist_web) the privileges it needs on the objects migrations manage. For functions it ran:
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO <runtime_role>
GRANT ... ON ALL FUNCTIONS IN SCHEMA requires the granting role to own every function in the schema; if it hits one it doesn’t own, the entire statement aborts. On CNPG-managed clusters the built-in PgBouncer pooler integration installs a superuser-owned user_search auth function into public. The migration role (tuist_app, the <cluster>-app owner) doesn’t own it, so the statement failed with:
** (Postgrex.Error) ERROR 42501 (insufficient_privilege) permission denied for function user_search
The migration runs as a pre-upgrade Helm hook, so this failed the hook, rolled the release back (rollback-on-failure), and wedged the Deploy to tuist-canary step. Because the production cascade runs canary → acceptance → production, every cascade stalled at canary, and the cut release would have hit production the same way once it got there.
Note: the chart comment / infra/cnpg/README.md state that CNPG installs user_search in the postgres database. In practice it also lands in the application database’s public schema (the failing GRANT ran against the tuist DB), which is why this wasn’t caught in review.
Not the runners-controller
The failure surfaced alongside runners: desired_replicas failed / tokenreview 401 log lines and a momentarily missing runners-controller pod (transient DiskPressure eviction). Those are “Diagnostics on failure” noise from the old server pod, not the cause — the helm step failed solely on the pre-upgrade migration hook.
Why this solution
- The runtime role only needs
EXECUTEon functions the app calls, all of which are created by migrations and owned by the migration role. It never calls operator-owned functions likeuser_search. - Scoping by ownership (
proowner = current_user::regrole) is the correct semantics for “grant access to the objects migrations manage” and is robust against any future operator/extension function landing inpublic. - Procedures are excluded (
prokind <> 'p') becauseGRANT EXECUTE ON FUNCTIONrejects them, matching the originalALL FUNCTIONSbehavior. - Tables and sequences keep the blanket grant:
publichas no views or foreign-owned relations, so those statements already only touch migration-owned objects. Converting them would risk dropping grants on view/matview classes for no benefit. ALTER DEFAULT PRIVILEGES ... GRANT EXECUTE ON FUNCTIONSis unchanged and continues to cover functions the migration role creates from here on.
Impact
Unblocks the canary step (and therefore the whole production cascade). Prevents the same failure from reaching production. No effect on environments that don’t set TUIST_DATABASE_RUNTIME_ROLE (dev, self-host) — the grant step is skipped there.
Validation
mix format --check-formattedandmix compilepass.- Rendered the generated SQL to confirm well-formed dollar-quoted PL/pgSQL with the role identifier safely interpolated (role is already regex-validated as a bare identifier).
- End-to-end validation is the canary cascade itself once this lands: the migrate Job runs the fixed
Release.migrate, the grant succeeds, and the pre-upgrade hook passes.
No GitHub comments yet.