Hive
fix(server): use maintained opentelemetry_finch to stop streamed-request telemetry crash
GitHub issue · Closed
What changed
Switches the opentelemetry_finch dependency in the server app from the stale Hex package to the maintained version in the opentelemetry-erlang-contrib monorepo — the same source we already use for opentelemetry_ecto. Scoped to server only; the cache app is left unchanged.
Why
The recurring production error ** (stop) {:badmap, {%Req.Request{}, %Req.Response{}}} (Sentry TUIST-Z4, domain telemetry, telemetry.erl:214) is not a bug in FetchLogsWorker. It’s a bug in opentelemetry_finch 0.2.0, whose [:finch, :request, :stop] handler does:
status =
case meta.result do
{:ok, response} -> response.status
_ -> 0
end
meta.result is {:ok, %Finch.Response{}} for Finch.request/3, but for Finch.stream_while/5 it is {:ok, acc} where acc is the caller’s accumulator. Req uses Finch.stream_while/5 whenever :into is a function — which FetchLogsWorker does to stream the GitHub Actions log archive from Azure — and its accumulator is a {request, response} tuple. So response.status becomes {req, resp}.status, which raises BadMapError.
Root cause and why it kept recurring
When a :telemetry handler raises, :telemetry permanently detaches it. So the handler crashes once per pod (hence Occurrences: 1), Finch tracing goes silently dark for that pod, and it reappears on the next deploy when a fresh pod re-attaches and the first streamed download crashes it again. Prod-only because the OpenTelemetry handlers only attach when traces_exporter != :none.
The two redirect fixes from last week (#11126, #11175) correctly fixed redirect + streaming composition in the worker, but the crash lives in the OTel telemetry handler and is tripped by any streaming Req request, so those fixes never touched it.
Why this solution
The published Hex package opentelemetry_finch 0.2.0 is a Bancolombia fork that has been unmaintained since November 2022 — we cannot get a fix merged/released there. The canonical, maintained instrumentation lives in opentelemetry-erlang-contrib (v0.3.0), and it already handles this case — its extract_status/1 matches the streaming {:ok, {_, %{status: status}}} tuple and falls back to nil for anything unexpected, instead of blindly calling response.status:
defp extract_status({:ok, %{status: status}}) when is_integer(status), do: status
defp extract_status({:ok, {status, _, _}}) when is_integer(status), do: status
defp extract_status({:ok, {_, %{status: status}}}) when is_integer(status), do: status
defp extract_status(_), do: nil
We already consume that monorepo via git for opentelemetry_ecto, the module name and OpentelemetryFinch.setup() API are unchanged, and it requires opentelemetry_api ~> 1.4 (already satisfied) — so this is a drop-in, no-vendoring fix. An earlier revision of this PR vendored a patched handler in tuist_common; that has been removed in favor of using upstream directly.
The cache app uses the same Hex package and carries the same latent bug, but is intentionally left out of this PR to keep changes to cache minimal — it can adopt the maintained source separately if/when it matters there.
User/developer impact
- Server Finch client spans keep being emitted after the first streaming request (previously tracing went dark per pod).
- No more
:badmaptelemetry crash noise in Sentry from the server.
Validation
mix deps.getresolves the contrib source;servercompiles clean.- Net diff is limited to
server/mix.exs+server/mix.lock.
🤖 Generated with Claude Code