Hive Hive
Sign in

feat(server): make the GitHub webhook hot path leave Postgres alone

GitHub issue · Closed

Metadata
Source
tuist/tuist #10902
Updated
Jun 24, 2026
Domains
Compute
Details

Summary

Pulls two PG round-trips out of the /webhooks/github request handler so a burst of inbound workflow_job deliveries can’t saturate the Ecto pool and start returning 502 to GitHub (which we observed during the #10886 bring-up when ~30 deliveries hit at once).

Diagnosis traced the slow path to two synchronous Postgres calls every webhook makes before returning 200:

  1. VCS.list_github_app_installations_for_webhook/2 — uncached Repo.all for HMAC candidate-set lookup, runs before signature verification.
  2. Oban.insert/1 — even for workflow_job.action values the dispatch worker treats as :ignored (most notably in_progress, which is ~33 % of workflow_job traffic).

Both write through the same per-replica pool. With Ecto’s 15 s checkout timeout, a burst that overflows the pool produces the exact 9 s upstream-timeout 502 we caught in GitHub’s hookshot delivery log.

What changed

Controller: skip the Oban.insert when the action isn’t dispatchable. Tuist.Runners.Dispatch.handle_webhook/2 only persists queued and completed; every other action falls through its catch-all :ignored branch. The controller now mirrors that gate and 200s without enqueueing, removing the PG write for ~33 % of inbound workflow_job deliveries.

VCS: cache list_github_app_installations_for_webhook/2. 60 s TTL, keyed by (installation_id, app_id). Steady-state webhook traffic skips the Repo.all/1 entirely. The TTL is the sole invalidation mechanism: the only field a stale entry could mis-serve (webhook_secret) only rotates through replace_github_app_installation/2, whose manifest-re-registration flow takes longer than 60 s anyway. The other three mutators (create, update, delete) can’t produce a stale-cache scenario worth instrumenting:

  • update_github_app_installation/2’s changeset whitelists only :html_url, :installation_id, :app_slug — none affect HMAC verification.
  • delete_github_app_installation/1 removes a row, but GitHub stops dispatching to an uninstalled App, so any in-flight webhook is signed by the still-valid (about-to-be-deleted) secret.
  • create_github_app_installation/1 has no entry to evict — the first lookup is a cold miss by definition.

Test plan

  • mix test test/tuist/vcs_test.exs test/tuist_web/controllers/webhooks/github_controller_test.exs — 125 tests, 0 failures
  • New cache-behaviour tests in vcs_test.exs cover: cache hit on repeated query, distinct cache slots for (iid, nil) vs (iid, aid), no-op when both filters are nil
  • New controller tests in github_controller_test.exs cover: in_progress → no enqueue, unknown action → no enqueue, completed → enqueue, plus the existing queued and no installation.id paths
  • mix credo clean on touched files
  • mix format --check-formatted clean

🤖 Generated with Claude Code

Comments

No GitHub comments yet.