Hive Hive
Sign in

fix(kura): flush REAPI ByteStream uploads before persisting

GitHub issue · Closed

Metadata
Source
tuist/tuist #11129
Updated
Jun 24, 2026
Domains
Kura
Details

What

The REAPI ByteStream/Write handler streamed upload chunks into a temp file with write_all, then persisted the blob by re-opening that path on a separate descriptor (stat + copy into a segment) — without flushing the temp file first. This flushes and closes the temp file before persisting, and adds a regression test.

Why (root cause)

tokio::fs::File buffers writes and flushes lazily. With no explicit flush, the persist path’s separate-descriptor stat + copy raced the background flush and intermittently failed with failed to persist CAS blob: appended N bytes into segment …, expected M (ByteStream/WriteINTERNAL).

The failure scales with the number of concurrent blob uploads, so it specifically and reliably broke remote caching of cargo build scripts: a build script’s directory OUT_DIR uploads dozens of blobs at once (e.g. librocksdb-sys ≈ 339 .o files), a subset of those ByteStream/Writes failed, Bazel then skipped UpdateActionResult for those actions, and they re-executed on every build — librocksdb-sys recompiled on every clean build (~8 min), cascading to everything that links it. Plain file-output actions (rustc rlibs) upload a single blob and rarely raced, so they cached fine and masked the bug.

Found while dogfooding Kura as the Bazel remote cache for the Kura Bazel build: a fresh build got hundreds of Kura cache hits yet still recompiled rocksdb.

Why this fix

Kura’s HTTP upload path (read_request_to_temp in utils.rs) already flushes before persisting; the ByteStream handler was the lone exception. Flushing + dropping the handle makes persist re-read a fully-written, closed file — minimal and matches the existing pattern. The action-cache and per-blob CAS round-trips were already correct; only the upload’s durability ordering was wrong.

Impact

  • Remote caching of cargo build scripts (and any action uploading many blobs concurrently / with directory outputs) now works through Kura.
  • No behavior change for already-working file-output actions.

Validation

Built a Kura image with the fix and re-ran a build-script round-trip against it, comparing Bazel --remote_grpc_log:

before after
ByteStream/Write failures 24/253 0/253
actions cached (UpdateActionResult) 59/81 81/81
GetActionResult not-found on rebuild 22 0
build scripts re-executed on rebuild 22 0

This reaches parity with Bazel’s local --disk_cache (81/81). A new regression test (bytestream_writes_persist_completely_under_concurrency) drives the real gRPC handler with concurrent multi-chunk uploads and asserts byte-exact round-trips; it fails reliably (5/5) without the fix and passes with it. cargo clippy --all-targets -- -D warnings and cargo fmt --check are clean.

🤖 Generated with Claude Code

Comments
GA
github-actions[bot] Jun 6, 2026

CLA Check Passed

Thank you @esnunes! Your contribution is ready to be reviewed. The CLA requirement has been satisfied.

TA
tuist-atlas[bot] Jun 8, 2026

The fix in this pull request is now available in kura@0.7.1. Update to ghcr.io/tuist/kura:0.7.1 to get the changes.