Hive Hive
Sign in

feat(specs): backfill and refresh revision summaries

GitHub issue · Closed

Metadata
Source
tuist/hive #52
Updated
Jun 24, 2026
Domains
Hive
Details

Summary

  • Refresh open spec pages when the async revision summary worker stores an agent-written summary.
  • Fall back to a plain Condukt run when structured revision summary output is not submitted, so current revisions can still get useful summaries.
  • Show an explicit pending message while agent summaries are enabled instead of presenting the additions/removals fallback as final.
  • Add a periodic revision summary sweeper that backfills missing summaries by spawning one worker per revision, and sanitize worker errors before Oban stores them.
  • Update Tuist production to Fireworks’ current Kimi K2.7 Code model through ReqLLM’s native provider: fireworks_ai:accounts/fireworks/models/kimi-k2p7-code.
  • Scale the Hive production worker pool from 2 to 4 workers so new app pods can run on fresh egress IPs that Fireworks accepts.
  • Persist the CNPG WAL archive safety-check override required for Hive’s existing dedicated backup path after the production timeline change.

Production diagnosis

  • Production has 31 eligible revisions with 0 stored summaries, and recent revision summary jobs were discarded after Fireworks returned HTTP 403.
  • Models.dev and Fireworks list Kimi K2.7 Code under accounts/fireworks/models/kimi-k2p7-code; the configured Hive Fireworks key can call it successfully outside the cluster.
  • The pod’s HIVE_LLM_API_KEY matches the current 1Password item by hash, so this is not secret drift.
  • The original Hive worker IPs 178.105.102.177 and 178.105.115.239 return Fireworks’ HTML HTTP 403 before API-key authentication, including with no Authorization header.
  • Atlas production can call the same Fireworks route and accounts/fireworks/models/kimi-k2p7-code successfully from its app pods, so the issue was specific to Hive’s original cluster egress IPs.
  • I scaled Hive’s CAPI worker pool to 4 workers. The new worker IPs 138.199.154.38 and 167.233.74.44 return Fireworks’ normal JSON HTTP 401 without auth, confirming they are not blocked at the edge.
  • I verified an authenticated production call from the new worker with accounts/fireworks/models/kimi-k2p7-code; Fireworks returned HTTP 200.
  • I cordoned the two blocked workers and restarted deploy/hive; the two live Hive app pods are now split across the two new accepted workers.
  • hive-postgres-1 was failing because CNPG was blocked by barman-cloud-check-wal-archive returning Expected empty archive after a timeline change on the existing dedicated archive path. I applied cnpg.io/skipEmptyWalArchiveCheck=enabled, let the WAL backlog drain, moved both Postgres instances to the fresh workers, and verified a manual backup completed.
  • The currently deployed app still has the pre-PR openai:accounts/fireworks/models/kimi-k2p5 config until this branch is merged and deployed, but a live Hive.Agents.Sessions call now succeeds from production after the egress rotation.

Testing

  • mix test test/hive/specs/revision_summary_sweeper_test.exs test/hive/specs/revision_summary_worker_test.exs test/hive/specs/revision_summaries_test.exs test/hive_web/live/spec_live/show_test.exs
  • mix compile --warnings-as-errors
  • mix test test/hive/specs/revision_summary_worker_test.exs
  • helm template hive infra/helm/hive -f infra/helm/hive/values-production.yaml
  • KUBECONFIG=/Users/pepicrft/.kube/tuist-mgmt.yaml kubectl apply --dry-run=server -f infra/k8s/cluster-production.yaml
  • Production Fireworks probes from old and new worker nodes, plus one live Hive.Agents.Sessions call through hive rpc.
  • Production CNPG remediation: verified hive-postgres healthy with 2 ready instances, zero queued WALs, streaming replication active, and completed hive-postgres-manual-20260619 backup.
Comments
GA
github-actions[bot] Jun 19, 2026

Blick review didn’t run

The blick review step failed before producing a manifest, so there’s no review to post on this PR. This usually means the agent (opencode) couldn’t start — common causes are an expired or suspended model API key, a missing secret, or the workflow timing out.

See the workflow run for details: https://github.com/tuist/hive/actions/runs/27812715214

Commit: 3e6f1db54b82e5fd7a907faf76eef3d90f9d10ba