Cross-run flaky detection used to mark every run in a contradicting group as flaky, including the passing runs. That meant a single test failure on a commit could inflate flaky_run_count, flakiness_rate, and the overview flakiness card by the total number of runs on that commit. With this fix, only the non-reproducing failure is flagged as flaky, while the passing runs that prove the test can succeed stay clean. This prevents auto-quarantine thresholds from tripping early because of repeated retries on a flaky commit, and makes the Flaky Runs tab, single-run view, and PR comment views show the accurate pass/fail split. Historical data is not backfilled, so existing inflated counts will age out of their evaluation windows.
Hive
Flaky-run detection now counts only failing runs
Published
Jun 23, 2026 · 16:27 UTC
Repository
tuist/tuist