New failures: Where do they come from? How do they get in? All you've ever wanted to know (or not).
So I collected data about the new failure modes (see failure-new.txt), mostly from the start of July (older data is incomplete).
By new failure mode I mean any failure group that I can say started happening on a specific day in that range. This calls for some caveats:
* A failure mode is a group of failures that happens together in a given test unit and thus likely all have the same origin. Normally each failure mode is described in a bug.
In the rest of this document I'll just say 'failure' in place of 'failure mode'.
* This includes failures that happen on non GitLab / TestBot machines.
* I don't know the origin of all of them so there are certainly some that are not caused by a change in Wine. So not all are Wine regressions.
* But until their cause is known they still need to be investigated so I'm still counting them when it seems appropriate. However I did not include those where it was known from the start that they were caused by external factors.
First the date range I used for each month: Jul 2023-06-30 - 2023-07-28 28 days Aug 2023-07-28 - 2023-08-28 31 days (08-28 is about when the TestBot -> GitLab CI bridge broke) Sep 2023-09-01 - 2023-10-06 39 days
So how many new failures are there? Jul 36 new failures -> 1.3 / day Aug 7 new failures -> 0.2 / day (vacation effect!) Sep 53 new failures -> 1.4 / day
Note that a single MR may cause multiple tests to fail but each new failing test would be counted separately here on the premise that they each need to be investigated and reported.
Where do they come from?
Jul Aug Sep Unknown 8 4 18 Commit 4 1 6 MR 7 2 17 Total 19 7 41 Total / day 0.7 0.2 1.1
* These counts are deduplicated except for the new failures of unknown origin obviously.
* The commits correspond to direct Wine commits so they don't go through the GitLab CI or the TestBot.
* About half of the new failures have no known origin. This can be because they happen on machines I don't have access to and thus where I cannot bisect. It can also be because the failure fails to reproduce which is annoyingly common.
* To keep up 5 to 8 failure would have to be fixed
How do they get in?
Jul Aug Sep Untested 2 1 7 Bad 2 0 0 Module 1 0 1 Extra configs 0 0 2 Flaky 2 0 2 WineTest only 1 0 0 Bridge 0 0 7 Misc 1 1 0 Total 9 2 19
Here's a description of the way the failures got in. Those correspond to deduplicated failure counts since we're not interested in the number of tests impacted but in how the commits got in:
o 33% Untested These correspond to direct Wine commits. As such they bypass the GitLab CI and TestBot so there was no way to detect them beforehand. There has been a lot of changes in the lowe level code lately so I think this is usually lower.
o 23% Bridge These failures were detected by the TestBot but not forwarded to the MR page because the mailing list to GitLab bridge broke on or about 2023-08-28.
o 13% Flaky These failures are random and just did not happen when the MR got tested by the GitLab CI and TestBot. One way to minimize the chances of this happening is to run the tests multiple times.
But the TestBot already runs the tests on up to 24 Windows configurations and 10 Linux ones which means random failures that are not configuration-specific have a very good chance of being caught. It's the other ones that get through, like those that only impact a specific locale or a specific Windows version.
o 7% Module When a patch modifies a module but not the tests it contains the TestBot skips some test configurations which can lead to some failures being undetected.
o 7% Extra This corresponds to failures specific to test configurations that the TestBot only uses for the nightly WineTest runs.
In the above counts I omitted the following: o 18 Unknown This corresponds to cases where what is causing the failure has not been identified, and thus how they got in undetected is not known. Note that there is one exception: if the failure does not impact the GitLab and TestBot machines then the reason is automatically "Non TestBot".
o 17 Non TestBot These new failures impact neither the GitLab nor the TestBot machines. This corresponds to macOS-specific failures for instance. Detecting them beforehand is therefore impossible... except by expanding the set of test machines.
Are they fixed yet?
Jul Aug Sep Fixed 15 (47%) 2 (33%) 21 (40%) Not fixed 17 (53%) 4 (67%) 31 (60%)
Not there yet.
* Take with a big grain of salt : the bugs are not always closed when a failure is fixed. I may also have forgotten to mark some entries as fixed in failures-new.txt but I think that part is mostly okay now.
* Older failures don't seem more likely to be fixed. But then failures are not necessarily reported in the order they appear.