Hive Hive
Sign in

fix(server): batch account-usage job inserts under the Postgres param limit

GitHub issue · Closed

Metadata
Source
tuist/tuist #11397
Updated
Jun 24, 2026
Domains
Atlas
Details

What changed

Tuist.Accounts.Workers.UpdateAllAccountsUsageWorker now chunks the per-account UpdateAccountUsageWorker jobs into batches of 1,000 before calling Oban.insert_all/1, instead of inserting the entire accumulated list in a single statement.

Why it changed (root cause)

This is the daily fan-out that refreshes each account’s cached monthly usage (@daily cron on hosted prod web pods, defined in lib/tuist/oban/runtime_config.ex). It paginated the DB read (page_size 100), but map_accounts_to_workers/1 is recursive and concatenated every page’s jobs into one list (workers ++ ...), which perform/1 then inserted with a single Oban.insert_all/1.

Oban binds 9 columns per job row, and PostgreSQL’s wire protocol caps a statement at 65,535 bind parameters. Once the backlog of stale accounts crossed 65535 ÷ 9 ≈ 7,281, the single multi-row INSERT blew the limit:

Postgrex.QueryError: postgresql protocol can not handle 69903 parameters, the maximum is 65535

69903 ÷ 9 = 7767 jobs in that failing insert.

The failure was self-reinforcing: the insert failed → no per-account jobs were enqueued → no account’s current_month_remote_cache_hits_count_updated_at was bumped → the backlog never drained and grew every day, until the job exhausted all 20 Oban attempts and the usage rollup stalled in production.

Why this approach over the obvious alternative

The natural alternative — insert each page as it’s read instead of accumulating — was deliberately not chosen. The per-account workers drain asynchronously and flip accounts out of the not_updated_today filter; with offset-based pagination that shifts the offsets mid-run and silently skips accounts. The existing accumulate-first design reads a stable snapshot before any insert. Chunking the final list preserves that snapshot semantics while bounding each insert to ~9,000 params (comfortable margin against future Oban column additions).

Impact

  • The daily account-usage refresh works again for large hosted instances.
  • First run after deploy will enqueue the full accumulated backlog across many batches — large but well within limits; it drains the stall.
  • No behavior change for normal-sized instances (single batch, identical to before).

Validation

  • mix compile — clean; mix credo on the worker — no issues.
  • Existing 5 tests still pass.
  • Added a regression test that stubs 2,500 accounts and asserts Oban.insert_all is invoked in 3 batches each ≤ 1,000 (the old code made a single 2,500-job call). All 6 tests pass.

🤖 Generated with Claude Code

Comments

No GitHub comments yet.