Hive Hive
Sign in

feat(infra,server): serve Kura HTTP cache + gRPC (Bazel REAPI) from a single host

GitHub issue · Closed

Metadata
Source
tuist/tuist #11356
Updated
Jun 24, 2026
Domains
Kura
Details

What & why

Serve the Kura HTTP cache API and gRPC (Bazel REAPI) from a single hostnamehttps://<host> for the REST cache and grpcs://<host> for the Remote Execution API — eliminating today’s separate grpc.<host> hostname. This is “Option E”: one host, path-based routing at the nginx ingress.

The backend was already a single Service exposing both http:4000 and grpc:50051; the only reason two hostnames existed is that ingress-nginx’s backend-protocol annotation is per-Ingress-object. This PR keeps two Ingress objects but co-hosts them on one host, split by path (gRPC service prefixes → grpc_pass; everything else → proxy_pass).

Implements the plan in kura-single-endpoint-grpc-http.md.

Phase 0 — validation spike (the hard gate)

The risky premise is whether ingress-nginx honors backend-protocol per-location when two Ingresses share one host. This is not the documented per-Ingress model, so it was demonstrated, not assumed.

Spun up an isolated kind cluster with ingress-nginx v1.11.3 (the same image the repo’s gateway controller pins), a single Service exposing http+grpc ports (mirroring Kura’s shape), and two Ingresses on one host spike.local:

  • Ingress A: / (Prefix) → http, backend-protocol: HTTP, TLS for the host.
  • Ingress B: REAPI/ByteStream path prefixes → grpc, backend-protocol: GRPC, no TLS of its own.

Results — PASS:

Check Result
curl https://H/up + arbitrary REST path ✅ 200 via proxy_pass (HTTP backend)
gRPC reflection (bidi-streaming) over grpcs://H ✅ service list returned
gRPC unary (DummyUnary) ✅ echo round-trip
gRPC server-streaming (DummyServerStream, the ByteStream read/write analog) ✅ 10 messages, trailers intact
Single TLS cert serves both ✅ Ingress B declared no TLS; relied on Ingress A’s cert for the shared host
Per-location backend-protocol on merged host ✅ one server block; grpc_pass for gRPC prefixes, proxy_pass for /
nginx host/annotation merge-conflict warnings ✅ none

Generated nginx.conf for the merged host (ground truth):

## start server spike.local # ← ONE server block
server { server_name spike.local ;
location ~* "^/build\.bazel\.remote\.execution\.v2\." { ... grpc_pass grpc://upstream_balancer; }
location ~* "^/google\.bytestream\." { ... grpc_pass grpc://upstream_balancer; }
location ~* "^/" { ... proxy_pass http://upstream_balancer; }
}

Critical finding that shaped the implementation

pathType: Prefix/ImplementationSpecific for the gRPC prefixes is wrong. ingress-nginx renders /build.bazel.remote.execution.v2. as location /build.bazel.remote.execution.v2./ plus an exact = /build.bazel.remote.execution.v2.. A real method path — /build.bazel.remote.execution.v2.Capabilities/GetCapabilities — matches neither (the char after the package prefix is C, not /) and falls through to / (HTTP). The fix, verified in the spike, is nginx.ingress.kubernetes.io/use-regex: "true" with anchored, regex-escaped paths (^/build\.bazel\.remote\.execution\.v2\.), which render as location ~* "..." and match correctly. This is the exact pitfall the plan flagged.

Changes by layer

Phase 1 — Operator (infra/kura-controller, Go)

  • reconcileGRPCIngress: co-hosts on spec.publicHost (not grpcPublicHost) with the REAPI/ByteStream regex path prefixes (use-regex), backend-protocol: GRPC, grpc backend port; declares no TLS (the public Ingress’s cert covers the shared host). Using PublicHost for the rule host means existing CRs that still carry a legacy grpc.<host> converge onto the single host the moment the controller rolls out.
  • reconcileGRPCCertificate: now a deleter — retires the dedicated <name>-grpc-tls Certificate and the Secret cert-manager leaves behind. The public certificate covers the host.
  • grpcPublicURL status derives from PublicHostgrpcs://<public host>.
  • Tests updated; go build/vet, gofmt, and go test ./... (incl. TestGatewayNginxConfigMatchesChart) pass.

Phase 2 — Server provisioner (Elixir, server/)

  • regions.ex: the managed-region gRPC host template now equals the public host template (env-suffix handling unchanged). One change drives both the CR’s grpcPublicHost == publicHost and grpc_public_urlgrpcs://<public host>.
  • kubernetes_controller.ex: @manifest_revision bumped so the reconciler re-applies the manifest to all existing instances.
  • Tests updated.
  • gRPC readiness check: intentionally skipped. The existing HTTP /up probe (kura.ex) now covers the whole endpoint — gRPC shares the same host, pod, TLS, and nginx server block, so a 200 on /up already proves the server block is live.

Phase 3 — CLI (Swift, cli/)

  • No changes needed. PR #11344 (which added the grpc. host prefix + TUIST_CACHE_GRPC_ENDPOINT override) is not in this branch. BazelSetupCommandService already scheme-swaps on the same host (https://<host>grpcs://<host>) and registers the credential helper for that same host; its tests already assert this. Confirmed there is no grpc.-prefix logic anywhere in cli/Sources.

Intent layer

  • infra/kura-controller/AGENTS.md updated to describe the single-host model.

Phase 4 — Rollout & backward compatibility

The migration is infra-only and there are no stranded clients:

  • The CLI already derives the gRPC endpoint by scheme-swapping the HTTP cache host; it never reads the server’s grpc_public_url.
  • Tuist.Kura.Provisioner.grpc_public_url/2 has zero callers in the codebase — the legacy grpc.<host> value is only ever materialized as an Ingress/DNS/cert by the operator.

Sequence:

  1. Ship infra (Phases 1–2). The single <name>-grpc Ingress object is updated in place (same object name) from host grpc.<host>/path / to host <host>/REAPI path prefixes. external-dns prunes the now-orphaned grpc.<host> A record once that host leaves the Ingress; the retired gRPC cert/secret are cleaned up by the controller. Because gRPC now lands on a host the CLI was already targeting, this is a pure improvement with no regression window.
  2. CLI requires no change (Phase 3).
  3. No separate “remove legacy” step is required, since nothing kept serving grpc.<host>.

Enterprise KuraGateway: automatically covered. The gateway controller creates no Ingress objects — it only runs a dedicated ingress-nginx for a dedicated IngressClass; the co-hosting change lives on the KuraInstance ingresses, which inherit that class.

Validation run

  • infra/kura-controller: gofmt -l clean, go vet ./..., go build ./..., go test ./... all pass (in golang:1.25).
  • Phase 0 end-to-end spike on kind + ingress-nginx v1.11.3 (HTTP + gRPC unary/streaming/reflection, single cert, nginx.conf inspected) — PASS. Throwaway cluster torn down.
  • server/: changed .ex/.exs files parse cleanly (Code.string_to_quoted!). Could not run mix test/credo/format locally — this worktree isn’t bootstrapped (deps//_build/ absent) and the 1Password-provisioned priv/secrets/test.key is not present, which the app boot in test_helper.exs requires. Relying on CI for the server suite.
  • The gateway’s http2_body_preread/buffer tuning is unchanged (it lives in the gateway ConfigMap/Helm values, applied controller-wide regardless of host/path), so large gRPC uploads keep the same tuning on the co-hosted location.
Comments
E
esnunes Jun 19, 2026

Did a deeper pass with the on-prem story in mind. Two findings I’d want to address before this lands, the rest are smaller.

On-prem follow-ups (not necessarily blocking, but worth landing alongside or right after):

  • infra/helm/tuist/values.yaml:27 still describes tlsClusterIssuer as covering “the per-instance public HTTPS and gRPC TLS Certificates” (plural). After this PR there’s one Cert per instance; the comment will mislead operator inventory tooling (kyverno/falco/GitOps) into expecting <name>-grpc-tls and possibly fighting our delete.
  • Nothing in server/priv/docs/en/guides/server/self-host/ mentions the single-host model, the cert-manager requirement on the controller path, the ingress-nginx requirement, or DNS migration steps for operators with their own grpc.<host> records. Worth a short page or section before this lands.
  • Standalone Kura chart at kura/ops/helm/kura/values.yaml:63 and kura/README.md:442 still document the split-host pattern (kura-grpc.local/kura-grpc.example.com). Out of scope for this PR, just flagging so the public Kura deployment story doesn’t drift from what we ship.

Inline comments below.

I’ve added a comment in the values.yaml and created the following issues to tackle the other two items: