fix(observability): report Oban failures after retries

Metadata

Source

tuist/hive #82

Updated

Jun 24, 2026

Domains

Hive

Details

What changed

Hive now reports Oban job failures to Sentry only after a job exhausts all attempts by default.

The change adds a named reporting predicate for the Sentry Oban integration, switches the runtime default for SENTRY_OBAN_REPORT_RETRIES to false, updates the Helm chart default to match, and documents the new behavior for self-hosted deployments.

Why

The Sentry issue came from a first-attempt timeout in the GitHub issue classification worker. Oban still had retry attempts available, so reporting the first failure created alert noise before the background job had a chance to recover.

Fixes HIVE-A.

Root cause

Hive configured Sentry to report retryable Oban failures by default. That meant an agent session timeout generated a Sentry event on attempt one even though Oban would retry the job and only discard it after all attempts were exhausted.

Approach

Keep the worker behavior unchanged so timeout errors still flow through Oban and use Oban’s retry budget. Move the noise control to Sentry’s Oban integration by installing a callback that reports only when attempt >= max_attempts.

Operators can still opt into every failed attempt by setting SENTRY_OBAN_REPORT_RETRIES=true.

Impact

Retryable background job failures no longer alert by default while Oban is still retrying. Jobs that exhaust all attempts continue to report to Sentry with the same Oban metadata.

Helm users get the same quieter default through the chart values. There is no migration required.

Validation

mix format
mix test test/hive/sentry_event_filter_test.exs test/hive/forage/github_issue_classification_worker_test.exs
helm template hive infra/helm/hive
mix compile --warnings-as-errors
helm template hive infra/helm/hive -f infra/helm/hive/values-production.yaml
mix test passed with 737 tests

Comments

No GitHub comments yet.