Hive Hive
Sign in

fix(server): start shard uploads by plan id

GitHub issue · Closed

Metadata
Source
tuist/tuist #11360
Updated
Jun 24, 2026
Domains
Kura
Details

Summary

This fixes the shard bundle upload 404 by removing the immediate reference lookup from the new-client upload path.

The flow is now:

  1. Create the shard plan.
  2. The server always returns the shard plan id and an upload_url for starting the shard bundle upload.
  3. If the CLI needs to upload test products, it starts the upload with the returned shardPlan.id.
  4. Upload URL generation and completion also use the returned shardPlan.id.

The existing reference-based upload endpoints remain supported for older clients.

Root cause

The failing CI run hit:

Failed to start shard upload due to an unknown server response of 404.

The shard plan creation itself succeeded. The problem was the request sequence immediately after creation:

  1. createShardPlan inserted the plan through IngestRepo.
  2. The CLI made a second request to startShardUpload using only the shard reference.
  3. startShardUpload looked the plan up through ClickHouseRepo by reference.
  4. In production, that immediate read is not guaranteed to see the just-ingested plan yet.
  5. The lookup returned nil, which the API mapped to 404.

So the failure was not a client cache configuration problem and not something that should be papered over by --no-binary-cache or local shard archives. It was a server-side read-after-write gap between the ingest/write path and the read/query path.

What Changed

Server API

  • Removed the start_upload flag from shard plan creation.
  • Replaced ShardPlan.upload_id with ShardPlan.upload_url, which the server returns for new clients but keeps optional in the schema for compatibility.
  • Changed startShardUpload to accept either shard_plan_id or reference.
  • Kept reference support for older clients.
  • Kept shard_plan_id support for upload URL generation and completion.
  • Return 400 when neither shard_plan_id nor reference is provided to upload start, URL generation, or completion endpoints.

Server Domain Logic

  • Added start_upload_for_plan_id/3, which starts storage directly from the known plan id.
  • Kept start_upload/3 as the reference-based compatibility path for older clients.
  • The new CLI path never needs to query ClickHouse immediately after plan creation.

CLI

  • ShardPlanService now creates the plan exactly once.
  • Archive-only and skip-upload flows stop after plan creation, as before.
  • Uploading flows call StartShardUploadService.startUpload with shardPlan.id.
  • Removed CreateShardPlanService.createShardPlanAndStartUpload, ShardPlanUpload, startUpload: true, and the extra wrapper state in ShardPlanService.

Generated API Client

  • Regenerated the OpenAPI spec and Swift client from the server schema changes.

Why This Approach

This keeps the model simple:

  • Plan creation creates a plan.
  • Upload start starts an upload.
  • The connection between both operations is the stable plan id returned by creation.

That avoids the original race without starting unused multipart uploads for --skip-upload or --shard-archive-path. It also avoids encoding timing assumptions into retries or sleeps.

The tempting alternatives were worse:

  • Retrying get_plan in startShardUpload would encode an eventual-consistency race into the request path.
  • Disabling binary cache or forcing --shard-archive-path would avoid the symptom in one workflow, but would not fix the server behavior.
  • Starting upload from createShardPlan via start_upload: true coupled plan creation to the caller’s next action and made the CLI contract more complicated than necessary.
  • Falling back from plan-id upload start to reference-based upload start would reintroduce the exact lookup race this change removes.

Compatibility and Impact

  • New CLI + new server uses plan-id upload start.
  • Older CLI + new server continues to use reference-based upload start/generate/complete.
  • No database schema changes are introduced.
  • No new customer data category or retention behavior is introduced, so server/data-export.md does not need an update.

Validation

Ran:

mise run generate-api-cli-code
mix format lib/tuist/shards.ex lib/tuist_web/controllers/api/shards_controller.ex lib/tuist_web/api/schemas/shards/shard_plan.ex test/tuist/shards_test.exs test/tuist_web/controllers/api/shards_controller_test.exs
swiftformat cli/Sources/TuistServer/Services/CreateShardPlanService.swift cli/Sources/TuistServer/Services/StartShardUploadService.swift cli/Sources/TuistKit/Services/Sharding/ShardPlanService.swift cli/Sources/TuistKit/Services/TestService.swift cli/Tests/TuistKitTests/Services/Sharding/ShardPlanServiceTests.swift cli/Tests/TuistKitTests/Services/Sharding/ShardMatrixOutputServiceTests.swift cli/Tests/TuistKitTests/Services/TestServiceTests.swift cli/Tests/TuistKitTests/Services/XcodeBuildBuildCommandServiceTests.swift
mix test test/tuist/shards_test.exs test/tuist_web/controllers/api/shards_controller_test.exs
git diff --check

Result:

44 tests, 0 failures

I also scanned for stale startUpload:, start_upload, createShardPlanAndStartUpload, and ShardPlanUpload references. The only remaining start_upload references are the server upload-start endpoint/function names.

Swift/Xcode Validation Note

I did not rerun the focused Swift xcodebuild test in this follow-up. Earlier attempts in this worktree failed before compilation because Tuist.xcworkspace is not currently a generated workspace file and CoreSimulator is unavailable in the sandbox.

Comments
T
tuist[bot] Jun 19, 2026

🛠️ Tuist Run Report 🛠️

Tests 🧪
Scheme Status Cache hit rate Tests Skipped Ran Commit
TuistAcceptanceTests 0 % 289 0 289 3d2b62b0d
TuistUnitTests 0 % 4945 6 4939 3d2b62b0d
Failed Tests ❌
  • TuistAcceptanceTests: 4 failed tests (View all)

  • TuistUnitTests: 2 failed tests (View all)

  • app_with_swiftpm_prebuilt_macro_dependency_generates_prebuilt_macro_settings() · TuistAutomationAcceptanceTests · BuildAcceptanceTestSwiftPMPrebuiltMacro
    (otherSwiftFlags → [“$(inherited)”, “-Xcc”, “-fmodule-map-file="$(SRCROOT)/Derived/ModuleMaps/MacroDependencyMacros-deps.modulemap"”]).contains(“-I”)

  • shard_with_remote_test_products() · TuistAutomationAcceptanceTests · TestAcceptanceTestShardWithRemoteTestProducts
    .badRequest(“Missing field: reference”)

  • output_noCI_writesFallbackJSON() · TuistKitTests · ShardMatrixOutputServiceTests
    (content → “{ “id” : “test-id”, “reference” : “test-ref”, “shard_count” : 2, “shards” : [ { “estimated_duration_ms” : 1000, “index” : 0,…

Showing 5 of 6 failed tests. See links above for full details.

Flaky Tests ⚠️
  • TuistUnitTests: 4 flaky tests (View all)
Test case Module Suite
run_outputsAWarning_when_noHashes() TuistKitTests HashSelectiveTestingCommandServiceTests
parseTestStatuses_returnsPassingModuleNames() TuistXCResultServiceTests XCResultServiceTests
parseTestStatuses_returnsCorrectStatuses() TuistXCResultServiceTests XCResultServiceTests
parseTestStatuses_extractsModuleAndSuiteNames() TuistXCResultServiceTests XCResultServiceTests
Builds 🔨
Scheme Status Duration Commit
TuistAcceptanceTests 5m 16s 3d2b62b0d
TuistUnitTests 5m 2s 3d2b62b0d
TA
tuist-atlas[bot] Jun 20, 2026

The fix for starting shard uploads by plan id is now available. Update to xcresult-processor-image@0.26.5 to apply it.