Two disk-safety hardening fixes for the CAS segment ring, surfaced while stress-testing the margin assumptions behind #11222 (the disk-derived capacity work). Both matter more as that PR ships: larger rings mean far more rotations, so more chances to crash inside the rotation window and more traffic through the free-space guard.
What changed
- The rotation free-space check now scales with the incoming artifact. Previously it demanded a fixed
MAX_SEGMENT_BYTES * SEGMENT_FREE_SPACE_MARGIN (1 GiB) no matter what was about to be written. The required bytes are now max(MAX_SEGMENT_BYTES, incoming_size) * SEGMENT_FREE_SPACE_MARGIN. Behavior is bit-for-bit identical for artifacts ≤ 512 MiB.
- Startup sweep for orphaned segment files.
Store::sweep_orphaned_segments runs once at startup (after Store::open, before any traffic): it lists data_dir/segments/, and every .seg file not referenced by the persisted ring state is routed through the existing evict_segment path. A failed sweep logs a warning and does not block startup.
Root causes
Oversized artifacts under-provisioned the guard. Rotation triggers when current_size + incoming > MAX_SEGMENT_BYTES, and then the whole incoming artifact is appended into the fresh segment — segment files can legitimately exceed 512 MiB (e.g. a 2 GiB replication body, MAX_REPLICATION_BODY_BYTES). The rotation check is the only free-space checkpoint (appends between rotations are unchecked), so a 2 GiB append could pass a 1 GiB guard and hit raw ENOSPC mid-append — made worse by the fact that during append_to_segment the artifact’s bytes transiently exist twice when the tmp dir shares the filesystem (the staged source is deleted only after the copy completes). The fix keeps the same 2× proportionality the margin has always encoded: one width for the new segment’s guaranteed growth, one width of slack for unchecked co-writers.
Crash-window orphans had no reclamation path. Rotation is deliberately ordered for crash consistency: persist ring state without the evicted segment → append → commit metadata → unlink the evicted file. The safe direction of that ordering means a crash (or an error return) between the state save and the unlink strands the segment file — and the manifests and index entries of every artifact inside it — with nothing referencing them and no code path that will ever clean them up. Each such event leaks up to a segment’s worth of disk permanently. The repo had no startup sweep of the segments directory at all.
Why this shape
- The sweep reuses
evict_segment rather than just unlinking files: that path already (atomically, with the tested batch logic) deletes the stranded manifests, the namespace and segment index entries, and the cached FD handle — so the crash-window case cleans up metadata-consistently, never leaving manifests pointing at deleted bytes.
- It is safe by construction: it runs under the
DataDirLock writer lock (acquired before Store::open in app.rs), before any rotation can run, so it cannot race the legitimate “state references a segment whose file does not exist yet” window of a fresh rotation. The inverse direction (state entry without a file) is untouched, as is everything outside segments/ (blob files, RocksDB).
- The required-bytes computation is extracted as a pure function so the policy is unit-testable.
Rollout safety
Node-local only: no on-disk format, wire format, or replication protocol changes. Segment files are still reclaimed exclusively by unlink (the mmap SIGBUS invariant in kura/AGENTS.md is upheld). Mixed-version meshes are unaffected. For existing deployments the sweep is a no-op unless a node has accumulated orphans, in which case it frees disk on the next restart.
Validation
- New unit tests:
segment_rotation_requires_margin_for_oversized_artifacts — fixed floor for small/equal sizes, scaled requirement for oversized ones.
sweep_orphaned_segments_returns_zero_without_segments_dir — first-boot no-op.
sweep_orphaned_segments_removes_stray_files_and_keeps_live_segments — stray file deleted, live segment and its artifact untouched and still readable.
sweep_orphaned_segments_reclaims_crash_window_segment_and_metadata — simulates the exact crash window (ring state saved without the segment, file still on disk) and asserts both the file and the stranded manifest are reclaimed.
cargo test --lib — 238 passed, 0 failed; cargo clippy --all-targets -- -D warnings — clean; cargo fmt — no changes.
🤖 Generated with Claude Code