Hive Hive
Sign in

fix(observability): report Oban failures after retries

GitHub issue · Closed

Metadata
Source
tuist/hive #82
Updated
Jun 24, 2026
Domains
Hive
Details

What changed

Hive now reports Oban job failures to Sentry only after a job exhausts all attempts by default.

The change adds a named reporting predicate for the Sentry Oban integration, switches the runtime default for SENTRY_OBAN_REPORT_RETRIES to false, updates the Helm chart default to match, and documents the new behavior for self-hosted deployments.

Why

The Sentry issue came from a first-attempt timeout in the GitHub issue classification worker. Oban still had retry attempts available, so reporting the first failure created alert noise before the background job had a chance to recover.

Fixes HIVE-A.

Root cause

Hive configured Sentry to report retryable Oban failures by default. That meant an agent session timeout generated a Sentry event on attempt one even though Oban would retry the job and only discard it after all attempts were exhausted.

Approach

Keep the worker behavior unchanged so timeout errors still flow through Oban and use Oban’s retry budget. Move the noise control to Sentry’s Oban integration by installing a callback that reports only when attempt >= max_attempts.

Operators can still opt into every failed attempt by setting SENTRY_OBAN_REPORT_RETRIES=true.

Impact

Retryable background job failures no longer alert by default while Oban is still retrying. Jobs that exhaust all attempts continue to report to Sentry with the same Oban metadata.

Helm users get the same quieter default through the chart values. There is no migration required.

Validation

  • mix format
  • mix test test/hive/sentry_event_filter_test.exs test/hive/forage/github_issue_classification_worker_test.exs
  • helm template hive infra/helm/hive
  • mix compile --warnings-as-errors
  • helm template hive infra/helm/hive -f infra/helm/hive/values-production.yaml
  • mix test passed with 737 tests
Comments

No GitHub comments yet.