Bug 2007112 - Perform a binary search backfill with the sheriffing bot with change detection technique integration#9237
Conversation
…t with change detection technique integration
beatrice-acasandrei
left a comment
There was a problem hiding this comment.
Awesome work! Could you add some inline comments for the main methods that you updated? Since a significant portion of the core logic resides on the m-c side, could we add a high-level summary of the cross-project workflow? It would be very helpful to document which actions are performed in m-c first and exactly how those results influence the logic in Perfherder.
| iteration_count = models.IntegerField(default=0) | ||
| last_detected_push_id = models.IntegerField(null=True, blank=True) | ||
| anchor_push_id = models.IntegerField(null=True, blank=True) | ||
| backfill_logs = models.TextField(default="[]") |
There was a problem hiding this comment.
I think adding some more context via comments would be helpful.
For example :
- iteration_count: Tracks search iterations.
- last_detected_push_id: The most recent culprit identification.
- anchor_push_id: The reference point for searches.
- backfill_logs: Detailed iteration history.
| (READY_FOR_PROCESSING, "Ready for processing"), | ||
| (BACKFILLED, "Backfilled"), | ||
| (SUCCESSFUL, "Successful"), | ||
| (FAILED, "Failed"), |
There was a problem hiding this comment.
It may be useful for debugging to have a separate status for the verification process (VERIFICATION_FAILED, "Verification Failed")
| logger.error(ex) | ||
|
|
||
| def verify_and_iterate(self, record: BackfillRecord, max_iterations: int = 5): | ||
| if record.iteration_count >= max_iterations: |
There was a problem hiding this comment.
Should we consider adding a timeout check alongside with the max iterations check?
Hi there :)
I've integrated the smart backfill logic in this patch.
I haven't added unit tests yet. I'd like to confirm that the overall direction and policy make sense first. Once that’s agreed on, I’ll follow up with tests.
Please let me know if you have any questions or if there’s anything you’d like me to change.
Bugzilla: https://bugzilla.mozilla.org/show_bug.cgi?id=2007112
Related Patch: https://phabricator.services.mozilla.com/D282605
Features
BackfillRecordtable.iteration_countlast_detected_push_idanchor_push_idbackfill_logs[1]In the example below [1], the initial alert point is push
10(previous_push_id: 10). After backfilling, verification detects changes at pushes9and14. Since push9is closer to the previous alert point (10), it is selected as the culprit.In this example, the search window is intentionally kept small for testing purposes. In production, we typically use a wider search window (around 12–24 pushes).
(Note: In this example, push
9represents a regression, while push14represents an improvement.)Ideally, it should match the number of pushes triggered by the real backfill action on the mozilla-central side. Also, if we need to manage this value, we can consider adding a new column for it.
verify_and_iteratemethod), the backfill record status remainsSUCCESSFUL. We could introduce a new status to represent this case more explicitly, or alternatively change the status toFAILED. I left this as a policy decision to be discussed.re_run_detect_changesmethod is based on the alert generation logic [2], in order to recreate the same environment used when alerts are originally generated.[0]
treeherder/treeherder/perf/models.py
Lines 887 to 891 in ce3a39b
[1] Example
[2]
treeherder/treeherder/perf/alerts.py
Lines 59 to 126 in ce3a39b