The fix to keep fresh replicas out of primary routing is now available in kura@0.8.4. Update to that version to apply this change.
Hive
fix(kura): keep fresh replicas out of primary routing
GitHub issue · Closed
Fixes Kura cache misses after replica churn by keeping large upload staging on each pod’s persistent data volume, bounding that staging inside Kura, and preventing freshly restarted replicas from becoming the public primary before they have time to bootstrap from peers.
The incident showed three related failure modes: multipart assembly could exceed the 4 GiB /tmp/kura emptyDir limit and evict pods, the replacement pod could then be selected for public cache traffic before its replicated artifact set had caught up, and increasing only the Kubernetes tmp size would move the failure threshold without preserving Kura’s bounded resource model.
This changes managed Kura instances, the standalone chart, and the local e2e compose runtime to stage temporary data under /var/cache/kura/tmp on the per-pod data volume, removes the tmp emptyDir, and introduces KURA_TMP_DIR_MAX_BYTES as an application-level staging budget. HTTP uploads, replication bootstrap, REAPI ByteStream writes, and multipart assembly now check that budget and return backpressure instead of relying on the PVC or kubelet as the first limit. The Helm chart exposes the budget as config.tmpDirMaxBytes, defaulting to 8 GiB.
It also adds a 10 minute primary-routing age gate for replicated instances. Single-replica instances remain routable immediately, while multi-replica instances avoid selecting freshly restarted pods as the public primary.
The new e2e regression runs Kura with a deliberately small tmp staging budget, posts through the public cache API, and asserts Kura returns 503 backpressure while the process remains healthy with no restart.
How to test locally
mise exec -- cargo checkfromkuramise exec -- cargo test from_lookupfromkuramise exec -- cargo test read_request_to_tempfromkuramise run test-e2e -- spec/e2e/tmp_budget_spec.shfromkuramise exec go -- go test ./controllersfrominfra/kura-controllermise exec helm -- helm template kura kura/ops/helm/kuragit diff --check