Hive
feat(server): add reliability automation monitor
GitHub issue · Closed
What changed
- Added
reliability_rateas an automation monitor type alongside flakiness rate and flaky run count. - Wired reliability monitor evaluation through alert validation, worker dispatch, API schemas/controllers, and the Project Automations LiveView UI.
- Added ClickHouse success-count aggregate state for daily and rolling per-test-case evaluation, with backfills and companion materialized views.
- Updated data export documentation and gettext templates for the new automation UI strings.
- Added focused tests for alert validation, monitor evaluation, worker dispatch, API creation, and LiveView form defaults.
Why it changed
Tests that fail nearly every run can have very low reliability while still not matching Tuist’s flaky-test definition, because flakiness is based on flaky runs rather than a simple success-rate drop. This adds the automation Curtis asked for: quarantine/mute/skip tests when their reliability falls below a configured threshold.
Root cause
The existing automation model could react to flakiness rate or flaky run count, but it did not have a monitor that measured successful runs over total runs. Very unreliable tests with no flaky runs were therefore invisible to automatic quarantine rules.
Impact
Project maintainers can now create automations such as “reliability rate < 90% over 30 days” or over the last N runs. Existing flakiness and event-driven automations keep their current behavior.
Validation
mix formatmix gettext.extractgit diff --check- Focused server tests:
mix test test/tuist/automations/alerts/alert_test.exs test/tuist/automations/monitors/flaky_tests_monitor_test.exs test/tuist/automations/workers/alert_evaluation_worker_test.exs test/tuist_web/controllers/api/automations/alerts_controller_test.exs test/tuist_web/live/project_automations_live_test.exs(107 tests, 0 failures)