Summary
This fixes the shard bundle upload 404 by removing the immediate reference lookup from the new-client upload path.
The flow is now:
- Create the shard plan.
- The server always returns the shard plan id and an
upload_url for starting the shard bundle upload.
- If the CLI needs to upload test products, it starts the upload with the returned
shardPlan.id.
- Upload URL generation and completion also use the returned
shardPlan.id.
The existing reference-based upload endpoints remain supported for older clients.
Root cause
The failing CI run hit:
Failed to start shard upload due to an unknown server response of 404.
The shard plan creation itself succeeded. The problem was the request sequence immediately after creation:
createShardPlan inserted the plan through IngestRepo.
- The CLI made a second request to
startShardUpload using only the shard reference.
startShardUpload looked the plan up through ClickHouseRepo by reference.
- In production, that immediate read is not guaranteed to see the just-ingested plan yet.
- The lookup returned
nil, which the API mapped to 404.
So the failure was not a client cache configuration problem and not something that should be papered over by --no-binary-cache or local shard archives. It was a server-side read-after-write gap between the ingest/write path and the read/query path.
What Changed
Server API
- Removed the
start_upload flag from shard plan creation.
- Replaced
ShardPlan.upload_id with ShardPlan.upload_url, which the server returns for new clients but keeps optional in the schema for compatibility.
- Changed
startShardUpload to accept either shard_plan_id or reference.
- Kept
reference support for older clients.
- Kept
shard_plan_id support for upload URL generation and completion.
- Return
400 when neither shard_plan_id nor reference is provided to upload start, URL generation, or completion endpoints.
Server Domain Logic
- Added
start_upload_for_plan_id/3, which starts storage directly from the known plan id.
- Kept
start_upload/3 as the reference-based compatibility path for older clients.
- The new CLI path never needs to query ClickHouse immediately after plan creation.
CLI
ShardPlanService now creates the plan exactly once.
- Archive-only and skip-upload flows stop after plan creation, as before.
- Uploading flows call
StartShardUploadService.startUpload with shardPlan.id.
- Removed
CreateShardPlanService.createShardPlanAndStartUpload, ShardPlanUpload, startUpload: true, and the extra wrapper state in ShardPlanService.
Generated API Client
- Regenerated the OpenAPI spec and Swift client from the server schema changes.
Why This Approach
This keeps the model simple:
- Plan creation creates a plan.
- Upload start starts an upload.
- The connection between both operations is the stable plan id returned by creation.
That avoids the original race without starting unused multipart uploads for --skip-upload or --shard-archive-path. It also avoids encoding timing assumptions into retries or sleeps.
The tempting alternatives were worse:
- Retrying
get_plan in startShardUpload would encode an eventual-consistency race into the request path.
- Disabling binary cache or forcing
--shard-archive-path would avoid the symptom in one workflow, but would not fix the server behavior.
- Starting upload from
createShardPlan via start_upload: true coupled plan creation to the caller’s next action and made the CLI contract more complicated than necessary.
- Falling back from plan-id upload start to reference-based upload start would reintroduce the exact lookup race this change removes.
Compatibility and Impact
- New CLI + new server uses plan-id upload start.
- Older CLI + new server continues to use reference-based upload start/generate/complete.
- No database schema changes are introduced.
- No new customer data category or retention behavior is introduced, so
server/data-export.md does not need an update.
Validation
Ran:
mise run generate-api-cli-code
mix format lib/tuist/shards.ex lib/tuist_web/controllers/api/shards_controller.ex lib/tuist_web/api/schemas/shards/shard_plan.ex test/tuist/shards_test.exs test/tuist_web/controllers/api/shards_controller_test.exs
swiftformat cli/Sources/TuistServer/Services/CreateShardPlanService.swift cli/Sources/TuistServer/Services/StartShardUploadService.swift cli/Sources/TuistKit/Services/Sharding/ShardPlanService.swift cli/Sources/TuistKit/Services/TestService.swift cli/Tests/TuistKitTests/Services/Sharding/ShardPlanServiceTests.swift cli/Tests/TuistKitTests/Services/Sharding/ShardMatrixOutputServiceTests.swift cli/Tests/TuistKitTests/Services/TestServiceTests.swift cli/Tests/TuistKitTests/Services/XcodeBuildBuildCommandServiceTests.swift
mix test test/tuist/shards_test.exs test/tuist_web/controllers/api/shards_controller_test.exs
git diff --check
Result:
44 tests, 0 failures
I also scanned for stale startUpload:, start_upload, createShardPlanAndStartUpload, and ShardPlanUpload references. The only remaining start_upload references are the server upload-start endpoint/function names.
Swift/Xcode Validation Note
I did not rerun the focused Swift xcodebuild test in this follow-up. Earlier attempts in this worktree failed before compilation because Tuist.xcworkspace is not currently a generated workspace file and CoreSimulator is unavailable in the sandbox.