Hive
fix(observability): report Oban failures after retries
GitHub issue · Closed
What changed
Hive now reports Oban job failures to Sentry only after a job exhausts all attempts by default.
The change adds a named reporting predicate for the Sentry Oban integration, switches the runtime default for SENTRY_OBAN_REPORT_RETRIES to false, updates the Helm chart default to match, and documents the new behavior for self-hosted deployments.
Why
The Sentry issue came from a first-attempt timeout in the GitHub issue classification worker. Oban still had retry attempts available, so reporting the first failure created alert noise before the background job had a chance to recover.
Fixes HIVE-A.
Root cause
Hive configured Sentry to report retryable Oban failures by default. That meant an agent session timeout generated a Sentry event on attempt one even though Oban would retry the job and only discard it after all attempts were exhausted.
Approach
Keep the worker behavior unchanged so timeout errors still flow through Oban and use Oban’s retry budget. Move the noise control to Sentry’s Oban integration by installing a callback that reports only when attempt >= max_attempts.
Operators can still opt into every failed attempt by setting SENTRY_OBAN_REPORT_RETRIES=true.
Impact
Retryable background job failures no longer alert by default while Oban is still retrying. Jobs that exhaust all attempts continue to report to Sentry with the same Oban metadata.
Helm users get the same quieter default through the chart values. There is no migration required.
Validation
mix formatmix test test/hive/sentry_event_filter_test.exs test/hive/forage/github_issue_classification_worker_test.exshelm template hive infra/helm/hivemix compile --warnings-as-errorshelm template hive infra/helm/hive -f infra/helm/hive/values-production.yamlmix testpassed with 737 tests
No GitHub comments yet.