I have been reviewing the TestBot and GitLab CI test results for the merged MRs. While doing that I updated the TestBot's known failures list (https://testbot.winehq.org/FailuresList.pl) in order to drive down the false positive rate.
Incidentally I also collected the list of test units causing false positives, so I'll start with that. Specifically, here are the bugs to fix to help the GitLab CI:
* Bug 53433 - mmdevapi:capture - impacted 18 MRs * Bug 54064 - ntdll:threadpool - impacted 15 MRs * Bug 54078 - ntdll:pipe - impacted 11 MRs * Bug 54140 - mmdevapi:render - impacted 5 MRs * Bug 54005 - ole32:clipboard - impacted 5 MRs * Bug 54037 - user32:msg - impacted 5 MRs * Bug 54074 - ws2_32:sock - impacted 5 MRs
I classified the TestBot / GitLab CI results as follows:
* False positive Cases where the CI system incorrectly claimed the MR introduces new failures. This is typically the case when the failures that are already present in nightly WineTest results.
* Bad merge
MRs that break a test and got merged anyway.
* Collateral Damage from a bad merge
The false positives (aka collateral damage) caused by one of the bad merges above.
* Outside interference
This identifies false positives that are not random and intrinsic to the test but that result from change outside the Wine infrastructure, for instance certificates that expire, or configuration changes to servers that break the tests that depend on it.
Of those the only ones that a CI can really avoid are the first type, aka "False positive". So I calculated the corresponding weekly rate:
Adjusted False Positive rate Week | TestBot | GitLab CI 2022-11-14 | 21.9% | 8.3% 2022-11-21 | 8.0% | 21.6% 2022-11-28 | 14.7% | 28.4% 2022-12-05 | 8.5% | 24.5% 2022-12-12 | 0.0% | 20.0%
Note that the TestBot's 8% rate for the 11-21 week is not representative because Wine was broken that week (collateral damage) which prevented the tests from running in Wine, and thus from contributing real "false positives". Also the 12-12 week is still incomplete obviously.
Even so I think his shows the TestBot is improving.
Here's a list of the incidents for the weeks above: * 11-14 An external certificate revocation issue caused crypt32:cert to fail systematically. This impacted 14 merge requests and was fixed in MR1360.
* 11-17 MR!1399 got merged despite the TestBot detecting that it prevented 32-bit Wine tests from running to completion. This impacted 39 merge requests. I could have reduced that number if I had been faster to reconfigure the TestBot to stop running the full 32-bit Wine test suite. This was fixed in MR!1524.
* 11-17 MR!1398 got merged despite the TestBot detecting that it broke ntoskrnl.exe:ntoskrnl on Windows 7. This was fixed in MR!1803.
* 11-22 MR!1495 got merged despite the TestBot detecting that it broke vbscript:run on Windows *. I don't have a record of the impacted MRs or of when it was fixed.
* 11-23 The b00a831d direct commit broke kernel32:process in Wine. This got fixed since.
* 12-07 MR!1732 got merged despite the TestBot detecting that it broke taskschd:scheduler on Windows *. I immediately added a known failure entry and no MR got impacted. This was fixed in MR!1736.
If not filtering out the failures caused by these incidents, the false positive rate is:
Raw False Positive rate Week | TestBot | GitLab CI 2022-11-14 | 52.1% | 27.1% 2022-11-21 | 50.0% | 29.5% 2022-11-28 | 20.0% | 33.7% 2022-12-05 | 19.1% | 57.4% 2022-12-12 | 0.0% | 20.0%
I think that also shows that the TestBot is improving.
I have attached the raw data I collected and shell snippets to extract various statistics (failures-mr.txt) as well as a spreadsheet import (failures.xls).