Hive
Cache effectiveness trend monitoring
GitHub issue · Open
Why is this needed?
@freak4pc & @natanrolnik reported that it’d be useful to have a way to monitor trends in cache effectiveness. That way, if they notice something has had a negative impact, they could trace it back to figure out what might have introduced a regression.
@freak4pc’s comment:
Our thought would be to understand if the hit rate drops under a certain amount from latest X build average (for example if last 30 builds on branch Y were at 80% and we drop to 50%, we know some configuration change is most likely responsible). We’re also going to try monitoring build time as that’s also a big indicator
We already have graphs for monitoring cache hit rate – is this for the Slack alerts?
Not the absolute values, but trend detection. For example, alerting when the effectiveness p95 drops (or spikes) compared to the previous period. Think “your cache hit rate degraded by 15% this week” rather than “your cache hit rate is 73%.”
The idea behind the issue was to first obtain those and surface in the dashboard
Got it, totally agree. It reminds of the home view in the Shopify dashboard that would show you alerts/info that you should be aware of and secondly the most useful analytics. @asmitbm it would be great if you could put up some design for this.
And yes, we can build on top of this feature for Slack alerts (or we can start with Slack alerts first, doesn’t matter too much). The data is already there, so we only need a way to surface the new change in the trend in the dashboard (or via Slack/other external tools)