feat(server): runner job steps + log capture (live tail, per-step)

Metadata

Source

tuist/tuist #10985

Updated

Jun 24, 2026

Domains

Compute

Details

Surfaces a runner job’s steps and logs on its detail page. On workflow_job: completed, the server pulls the canonical log from GitHub’s Actions Logs API, parses it into per-line rows in ClickHouse, and gzips a full-log archive to S3 for fast downloads. Steps land in their own ClickHouse table so step-level analytics stay first-class.

What it looks like

The shipping changelog entry — server/priv/marketing/changelog/2026.06.03-runner-job-steps-and-logs.md — will render at /changelog once the PR ships.

Steps card, with two steps expanded — per-step line numbers, foldable ##[group]Run … block, ANSI colours rendered, body indented under the chevron:

Steps card

Logs tab, the flat live-tail view with the same colourised rendering, full-date GMT timestamps, and tight line spacing:

Logs tab

What changed

Steps

Capture the GitHub workflow_job steps array (name / status / conclusion / number / start+finish timestamps) from the workflow_job.completed webhook.
Render a Steps card on the detail page. Expanding a step reveals the log lines that belong to it, sliced by walking the ##[group]Run … markers in the captured log (see Tuist.Runners.JobLogs.lines_grouped_by_step/2).
Storage: each step is its own row in runner_job_steps, a ReplacingMergeTree keyed (workflow_job_id, number) with inserted_at as the version column and account_id denormalised. See “Why a steps table” below.

Log capture (server-side from GitHub’s Logs API)

Tuist.Runners.Workers.FetchLogsWorker runs on the webhooks queue when workflow_job: completed arrives. It mints an installation token via the GitHub App (cached for 10 min in KeyValueStore), calls GET /repos/{repo}/actions/jobs/{id}/logs, peels the ISO timestamp off each line, and JobLogs.append/1s every row into runner_job_logs. Unique on workflow_job_id over 5 min so a redelivered webhook doesn’t double-ingest; retries with Oban backoff for the ~30 s window where GitHub returns 404 while finalising the archive. The UTF-8 BOM at the head of the payload is stripped so line 1’s ISO peel doesn’t leak the raw timestamp into the message text.
Why fetch instead of ship in-VM — step output never appears in the actions/runner Listener’s stdout or in _diag/Worker_<utc>.log: the .NET Worker spawns the user’s shell with anonymous pipes and forwards each line directly to GitHub’s ResultsLog HTTP stream. The Logs API is the only stable source of step content without modifying the runner binary or inserting a per-step shell shim. The rationale lives in the worker’s @moduledoc.
Per-line storage runner_job_logs: ReplacingMergeTree keyed (workflow_job_id, line_number), partitioned by month, 90-day TTL matching GitHub Actions’ own default retention.
Archive worker (Tuist.Runners.Workers.ArchiveLogsWorker): on a successful fetch with non-empty lines, FetchLogsWorker enqueues the archive worker, which folds the full log into the plain-text download format, gzips it, uploads to S3 at runners/{account_id}/{workflow_job_id}/runner.log.gz, and writes the key onto the job row (log_archive_key). Unique per workflow_job_id in Oban so a redelivered finalize doesn’t rebuild an archive already uploading.

Out of scope for this PR: a runner-side feeder for true live tail. The detail page renders the captured log on mount; there is no in-flight streaming surface in this PR.

Detail page UI

Top-level tabs: Overview (Steps + CI Details) and Logs.
Logs view: terminal-styled scrollable panel, ANSI SGR rendered as coloured spans, full-log substring search (server-side via CH ILIKE, not just the loaded tail), show / hide timestamps (full Tue, 02 Jun 2026 20:26:29 GMT format, off by default), pagination (most recent 200 lines on mount plus “Load older logs”).
Steps card: each step expands into its slice with per-step line numbers restarting at 1, foldable ##[group]Run … blocks (native <details>/<summary>), and the body indented under the chevron. Slicing walks ##[group]Run markers, not timestamps, so sub-second steps don’t collapse to an empty window. See Tuist.Runners.JobLogs.lines_grouped_by_step/2 and Tuist.Runners.LogFormatter.
Download logs: hits /runners/runs/.../jobs/.../logs/download. When the archive exists, the endpoint 302-redirects to a presigned S3 URL (with a response-content-disposition override for a friendly .log.gz filename) so the bytes come straight from S3. While the archive hasn’t landed yet, the endpoint falls back to streaming runner_job_logs in batches via send_chunked. Access is rechecked at click time (no presigned URLs minted on every page render, no bucket layout in the rendered HTML).

Why ClickHouse for logs (not object storage)

The interactive reads on a job’s log are all ordered scans over append-only data: per-step slicing pulls WHERE workflow_job_id = ? ORDER BY line_number, the Logs view tails the most recent N, search is a server-side ILIKE across every captured line, and “Load older” walks line_number backwards. CH’s order key serves all of these. The S3 archive sits on top — once a job is done, a single gzip object is cheaper to serve than a chunked CH scan on every download click — but CH remains the source of truth for slicing and search.

Why a steps table (not a JSON blob)

Step-level analytics — failure rate per step name across a workflow, p95 of the Build step, slowest steps — are ordinary GROUP BY / quantile() queries when each step is a row, but application-side JSON parses across thousands of jobs when steps are a column on runner_jobs. The row shape keeps those dashboards as plain CH queries.

Validation

Tests green across the runners surface: JobLogs, JobSteps, Jobs, Dispatch, FetchLogsWorker, ArchiveLogsWorker, PruneArchivedLogsWorker, LogFormatter, RunnersController, RunnerJobLogsController (download redirect + chunked fallback), RunnerJobLive. mix credo + mix format clean.
Staging end-to-end (runs 26843915002, 26845466475, 26845984478, 26869684986): workflow_job: completed webhook → FetchLogsWorker fetches GH log in <1 s → JobLogs.append → log_state flips to complete → ArchiveLogsWorker uploads gzip to S3 → Download logs 302-redirects to a presigned URL that decompresses to the full log.
Archive worker verified end-to-end against local minio: seeded log lines produced a gzip object; the presigned URL returns 200 with the expected Content-Disposition and decompresses to the full log.

How to test locally

cd server && mise exec -- mix test test/tuist/runners/ test/tuist_web/controllers/runners_controller_test.exs test/tuist_web/controllers/runner_job_logs_controller_test.exs test/tuist_web/live/runner_job_live_test.exs
Run the server with seeded data (a smoke fixture under priv/repo/fixtures/runner_smoke.log is parsed via the same FetchLogsWorker.parse_lines/3 path used in production and lands at /tuist/runners/runs/4900010/jobs/4900001). The Steps card lists each step (expand for its slice of the log), the Logs tab shows the captured stream with search / timestamps / pagination, and the Download logs button redirects to S3 once the archive worker has run.

🤖 Generated with Claude Code

Comments

GA

github-actions[bot] Jun 3, 2026

🚨 TruffleHog Secret Scan Failed

Verified secrets were detected in this pull request.

Please take the following actions:

Rotate the exposed credential(s) immediately - assume they are compromised
Remove the secret from your code - use environment variables or a secrets manager instead
If the secret was committed previously, you may need to rewrite git history using git filter-repo or similar tools

For more information, check the workflow run logs.

TA

tuist-atlas[bot] Jun 6, 2026

This feature is now available in xcresult-processor-image@0.12.0. Update to this version to use it.

TA

tuist-atlas[bot] Jun 6, 2026

The runner job steps and log capture feature (live tail, per-step) is now available in server@1.207.0. Update to get this feature.

TA

tuist-atlas[bot] Jun 6, 2026

This feature is now available in runner-image@0.3.0. Update to this version to access runner job steps and log capture functionality.

TA

tuist-atlas[bot] Jun 6, 2026

This release includes the changes from this pull request. The runner job steps and log capture feature (live tail, per-step) is now available in linux-runner-image@0.5.0. Update to this version to use the new functionality.