Hive
feat(server): make the GitHub webhook hot path leave Postgres alone
GitHub issue · Closed
Summary
Pulls two PG round-trips out of the /webhooks/github request handler so a burst of inbound workflow_job deliveries can’t saturate the Ecto pool and start returning 502 to GitHub (which we observed during the #10886 bring-up when ~30 deliveries hit at once).
Diagnosis traced the slow path to two synchronous Postgres calls every webhook makes before returning 200:
VCS.list_github_app_installations_for_webhook/2— uncachedRepo.allfor HMAC candidate-set lookup, runs before signature verification.Oban.insert/1— even forworkflow_job.actionvalues the dispatch worker treats as:ignored(most notablyin_progress, which is ~33 % ofworkflow_jobtraffic).
Both write through the same per-replica pool. With Ecto’s 15 s checkout timeout, a burst that overflows the pool produces the exact 9 s upstream-timeout 502 we caught in GitHub’s hookshot delivery log.
What changed
Controller: skip the Oban.insert when the action isn’t dispatchable. Tuist.Runners.Dispatch.handle_webhook/2 only persists queued and completed; every other action falls through its catch-all :ignored branch. The controller now mirrors that gate and 200s without enqueueing, removing the PG write for ~33 % of inbound workflow_job deliveries.
VCS: cache list_github_app_installations_for_webhook/2. 60 s TTL, keyed by (installation_id, app_id). Steady-state webhook traffic skips the Repo.all/1 entirely. The TTL is the sole invalidation mechanism: the only field a stale entry could mis-serve (webhook_secret) only rotates through replace_github_app_installation/2, whose manifest-re-registration flow takes longer than 60 s anyway. The other three mutators (create, update, delete) can’t produce a stale-cache scenario worth instrumenting:
update_github_app_installation/2’s changeset whitelists only:html_url,:installation_id,:app_slug— none affect HMAC verification.delete_github_app_installation/1removes a row, but GitHub stops dispatching to an uninstalled App, so any in-flight webhook is signed by the still-valid (about-to-be-deleted) secret.create_github_app_installation/1has no entry to evict — the first lookup is a cold miss by definition.
Test plan
-
mix test test/tuist/vcs_test.exs test/tuist_web/controllers/webhooks/github_controller_test.exs— 125 tests, 0 failures - New cache-behaviour tests in
vcs_test.exscover: cache hit on repeated query, distinct cache slots for(iid, nil)vs(iid, aid), no-op when both filters are nil - New controller tests in
github_controller_test.exscover:in_progress→ no enqueue, unknown action → no enqueue,completed→ enqueue, plus the existingqueuedandno installation.idpaths -
mix credoclean on touched files -
mix format --check-formattedclean
🤖 Generated with Claude Code
No GitHub comments yet.