What changed
Restores large-blob upload throughput to Kura over gRPC (REAPI ByteStream.Write). The regression has two stacked ceilings — the gateway nginx in front of Kura, and Kura’s own gRPC server — so the fix has two layers, plus a CI/deploy change that made the second layer testable on staging.
1. Gateway nginx HTTP/2 windows — raised on every Kura gateway, in both render paths:
- Dedicated gateways (
kgw-<account-hash>-<region>): the KuraGateway controller ConfigMap (gatewayNginxConfigData).
- Regional gateways (
kura-eu-central / kura-us-east / kura-us-west): the platform chart’s ingress-nginx controller.config.
client-body-buffer-size: "4m" # WINDOW_UPDATE pacing while streaming upstream
http2-max-concurrent-streams: "32" # bounds worst-case nginx memory per connection
http-snippet: "http2_body_preread_size 4m;" # initial advertised request-body window
2. Kura gRPC runtime (kura/src/reapi/mod.rs) — so the Kura hop isn’t the next bottleneck once nginx is fixed:
- Raise the tonic/hyper HTTP/2 windows from their 1 MiB defaults:
initial_stream_window_size = 4 MiB, initial_connection_window_size = 16 MiB.
- Keep
max_connection_age at 300s (so connections still recycle every 5 min and rebalance onto fresh pods after a rolling deploy) but raise max_connection_age_grace 300s → 900s. The age-boundary GOAWAY only stops new streams — an in-flight upload runs until the grace expires — so the 300s grace was the actual bug that cut large uploads at age+grace=600s. 900s comfortably outlasts the largest observed upload (~680s), and worst-case connection lifetime stays ~20 min (vs the 2h a 3600+3600 pair would let a slow-but-active stream hold a connection + temp-file slot).
- Add a per-chunk stall timeout on the upload path: each received chunk resets a 60s timer, so a connection with data actively flowing is never cut, but a stalled/vanished client is reclaimed promptly (returns
DEADLINE_EXCEEDED and removes the temp file). This is the “kill only on inactivity, not on activity” behavior — tonic has no activity-based connection timeout, so it’s enforced at the stream level where the upload actually happens.
3. Build + deploy the Kura runtime image from Server Deployment (.github/workflows/server-deployment.yml):
- The workflow already built the server and
kura-controller images per commit, but the kura runtime image is otherwise only published by release.yml on push-to-main. So deploying a branch to staging shipped the chart (layer 1) while the kura binary stayed pinned to the latest released kura@ tag — making layer 2 impossible to validate pre-release. (Confirmed in practice: a staging deploy of this branch left the pods on ghcr.io/tuist/kura:0.10.1 and the 784MB blobs still died at exactly 300+300s.)
- Now the build job also builds
ghcr.io/tuist/kura:sha-<SHA> (from source, kura/Dockerfile, mirroring the controller step). The deploy job defaults kuraRuntime.image.tag to that per-commit image on staging, while canary/production keep pinning the resolved kura@ release tag so they never run an unreleased binary. An explicit kura_runtime_image_tag input still overrides everywhere.
Why
Large Bazel REAPI uploads to Kura were stuck in an infinite retry loop. The client’s binary gRPC log showed every ByteStream.Write capped at ~310KB/s = nginx’s default 64KB HTTP/2 body window ÷ the 193ms RTT. At that rate the 784MB librocksdb-sys artifacts need ~42 min, but streams are force-closed at ~600s (Kura’s max_connection_age + grace), so Bazel restarts from byte 0 forever: Write → 502 at 600s → QueryWriteStatus → NOT_FOUND → restart.
Root cause of the window cap: regression from #11192, which moved Kura gRPC from per-instance TCP-passthrough LBs (TLS terminated by tonic, 1MB windows ≈ 5.4MB/s/stream) behind ingress-nginx gateways. That PR’s analysis covered aggregate NIC/CPU but not HTTP/2 per-stream flow control, and its smoke test exercised HTTP roundtrips, not large REAPI writes — so it shipped unnoticed. Both nginx knobs must move together: http2_body_preread_size sets the initial window, client_body_buffer_size (ingress default 8k) paces the WINDOW_UPDATEs once nginx relays the body.
The kura-runtime layer was added after a first staging run showed the nginx fix alone plateaus: per-stream throughput rose ~20x but capped at ~1.2MB effective window (Kura’s own 1 MiB tonic window on the nginx→kura hop), and the 784MB blob still died at the hard 600s max_connection_age. Raising Kura’s windows lifts the plateau; replacing the age-based kill with an inactivity timeout means active uploads finish regardless of size.
Real-infra validation (staging)
Two staging phases, against grpcs://grpc.tuist-eu-central-1-staging.kura.tuist.dev (RTT ~209ms, client in Brazil, 300 Mbps uplink):
Phase A — gateway fix only (pods still on released 0.10.1): per-stream upload rose from the ~0.3 MB/s baseline to ~6 MB/s solo, but plateaued at ~1.2MB effective window and the two 784MB blobs were severed at connection-age 600.9s (= 0.10.1’s 300+300), at 79% / 88%. This is what motivated layers 2 and 3.
Phase B — full fix deployed (pods on ghcr.io/tuist/kura:sha-518f3597e34c, the branch binary, via change #3): the regression is gone.
| Metric |
baseline (pre-fix) |
Phase A (nginx fix only) |
Phase B (nginx + runtime) |
| 784MB blob outcome |
∞ retry loop |
502 at ~88%, died at 600s |
✅ 100% in 167–223s, single stream |
| big-blob per-stream |
~0.31 MB/s |
~1.15 MB/s (before dying) |
3.5–4.7 MB/s |
| both giants combined |
— |
~2.2 MB/s (failed) |
7.0 MB/s |
| aggregate upload phase |
— |
4.0 MB/s |
5.9 MB/s |
In Phase B all 337 Writes succeeded with zero QueryWriteStatus — both ~784MB RocksDB artifacts uploaded to 100% in a single stream (784,550,880 in 167s, 783,930,496 in 223s), no connection-age teardown, no restart-from-zero.
Warm-cache run (everything already cached): 36s end-to-end, 803 GetActionResult hits + 98 Reads (90.7 MB, incl. the single 90.6 MB artifact) all OK, zero uploads, no auth errors — the happy path is fast and clean.
Local e2e validation
kura/test/e2e/grpc-upload-throughput/ — a toxiproxy + nginx + kura docker-compose harness that runs one kura backend behind baseline-vs-patched nginx configs (generated from the live chart values) under identical WAN latency, and asserts the patched path is ≥4x faster. Measured (16MB payload): at 192ms RTT, baseline 0.32 MB/s vs patched 12.24 MB/s (38.7x); baseline reproduces the production ~310 KB/s exactly. Runs as the gateway-throughput shard of the Kura e2e workflow. A controller unit test (TestGatewayNginxConfigMatchesChart) keeps the dedicated and regional render paths in sync.
Note: the e2e exercises a single stream against a bare fast kura, so it validates the nginx window in isolation and over-predicts; the staging runs are what surfaced the kura-side plateau (hence layer 2) and proved the end-to-end fix (hence layer 3).
Verification
infra/kura-controller go test ./... passes (incl. the chart-vs-controller equality assertion).
helm template renders the three window keys into each regional gateway ConfigMap (anchored to one source).
kura/src/reapi/mod.rs changes compile (cargo check --lib) and pass cargo clippy.
server-deployment.yml lints clean (actionlint); staging deploy confirmed the pods rolled onto kura:sha-518f3597e34c.
Deferred follow-up
Credential-token TTL vs. long-lived sessions. Now that uploads complete and builds run longer, this is the leading remaining symptom: in the full cold build, 1,089 of 2,886 Reads failed UNAUTHENTICATED in a single burst ~414s (~7 min) into the build — a fixed-TTL token expiring mid-invocation, not a Kura fault. Bazel doesn’t retry them, so those cache reads are lost (build still completes, with a degraded cache-hit rate where the burst lands). The warm-cache build (36s, under the TTL) never hits it. Fix is outside the upload path — longer-lived REAPI tokens or a Bazel credential helper that refreshes proactively / on UNAUTHENTICATED — and is tracked separately.
🤖 Generated with Claude Code