Hive Hive
Sign in

fix(server): use maintained opentelemetry_finch to stop streamed-request telemetry crash

GitHub issue · Closed

Metadata
Source
tuist/tuist #11275
Updated
Jun 24, 2026
Details

What changed

Switches the opentelemetry_finch dependency in the server app from the stale Hex package to the maintained version in the opentelemetry-erlang-contrib monorepo — the same source we already use for opentelemetry_ecto. Scoped to server only; the cache app is left unchanged.

Why

The recurring production error ** (stop) {:badmap, {%Req.Request{}, %Req.Response{}}} (Sentry TUIST-Z4, domain telemetry, telemetry.erl:214) is not a bug in FetchLogsWorker. It’s a bug in opentelemetry_finch 0.2.0, whose [:finch, :request, :stop] handler does:

status =
case meta.result do
{:ok, response} -> response.status
_ -> 0
end

meta.result is {:ok, %Finch.Response{}} for Finch.request/3, but for Finch.stream_while/5 it is {:ok, acc} where acc is the caller’s accumulator. Req uses Finch.stream_while/5 whenever :into is a function — which FetchLogsWorker does to stream the GitHub Actions log archive from Azure — and its accumulator is a {request, response} tuple. So response.status becomes {req, resp}.status, which raises BadMapError.

Root cause and why it kept recurring

When a :telemetry handler raises, :telemetry permanently detaches it. So the handler crashes once per pod (hence Occurrences: 1), Finch tracing goes silently dark for that pod, and it reappears on the next deploy when a fresh pod re-attaches and the first streamed download crashes it again. Prod-only because the OpenTelemetry handlers only attach when traces_exporter != :none.

The two redirect fixes from last week (#11126, #11175) correctly fixed redirect + streaming composition in the worker, but the crash lives in the OTel telemetry handler and is tripped by any streaming Req request, so those fixes never touched it.

Why this solution

The published Hex package opentelemetry_finch 0.2.0 is a Bancolombia fork that has been unmaintained since November 2022 — we cannot get a fix merged/released there. The canonical, maintained instrumentation lives in opentelemetry-erlang-contrib (v0.3.0), and it already handles this case — its extract_status/1 matches the streaming {:ok, {_, %{status: status}}} tuple and falls back to nil for anything unexpected, instead of blindly calling response.status:

defp extract_status({:ok, %{status: status}}) when is_integer(status), do: status
defp extract_status({:ok, {status, _, _}}) when is_integer(status), do: status
defp extract_status({:ok, {_, %{status: status}}}) when is_integer(status), do: status
defp extract_status(_), do: nil

We already consume that monorepo via git for opentelemetry_ecto, the module name and OpentelemetryFinch.setup() API are unchanged, and it requires opentelemetry_api ~> 1.4 (already satisfied) — so this is a drop-in, no-vendoring fix. An earlier revision of this PR vendored a patched handler in tuist_common; that has been removed in favor of using upstream directly.

The cache app uses the same Hex package and carries the same latent bug, but is intentionally left out of this PR to keep changes to cache minimal — it can adopt the maintained source separately if/when it matters there.

User/developer impact

  • Server Finch client spans keep being emitted after the first streaming request (previously tracing went dark per pod).
  • No more :badmap telemetry crash noise in Sentry from the server.

Validation

  • mix deps.get resolves the contrib source; server compiles clean.
  • Net diff is limited to server/mix.exs + server/mix.lock.

🤖 Generated with Claude Code

Comments
TA
tuist-atlas[bot] Jun 16, 2026

The fix for the streamed-request telemetry crash is now available. Update to xcresult-processor-image@0.21.0 to use it.

TA
tuist-atlas[bot] Jun 16, 2026

The fix using maintained opentelemetry_finch to stop streamed-request telemetry crashes is now available in server@1.212.0. Update to this version to get it.