New failures analysis - wine-devel

20 Oct 2023


      New failures:
 Where do they come from?
 How do they get in?
 All you've ever wanted to know (or not).
So I collected data about the new failure modes (see failure-new.txt), 
mostly from the start of July (older data is incomplete).
By new failure mode I mean any failure group that I can say started 
happening on a specific day in that range. This calls for some caveats:
* A failure mode is a group of failures that happens together in a 
  given test unit and thus likely all have the same origin. Normally 
  each failure mode is described in a bug.
In the rest of this document I'll just say 'failure' in place of 
  'failure mode'.
* This includes failures that happen on non GitLab / TestBot machines.
* I don't know the origin of all of them so there are certainly some 
  that are not caused by a change in Wine. So not all are Wine 
  regressions.
* But until their cause is known they still need to be investigated so
  I'm still counting them when it seems appropriate. However I did not 
  include those where it was known from the start that they were caused 
  by external factors.
First the date range I used for each month:
  Jul 2023-06-30 - 2023-07-28   28 days
  Aug 2023-07-28 - 2023-08-28   31 days
      (08-28 is about when the TestBot -> GitLab CI bridge broke)
  Sep 2023-09-01 - 2023-10-06   39 days
So how many new failures are there?
  Jul  36 new failures -> 1.3 / day
  Aug   7 new failures -> 0.2 / day (vacation effect!)
  Sep  53 new failures -> 1.4 / day
Note that a single MR may cause multiple tests to fail but each new 
  failing test would be counted separately here on the premise that they 
  each need to be investigated and reported.
Where do they come from?
Jul    Aug    Sep
  Unknown         8      4     18
  Commit          4      1      6
  MR              7      2     17
  Total          19      7     41
  Total / day   0.7    0.2    1.1
* These counts are deduplicated except for the new failures of 
    unknown origin obviously.
* The commits correspond to direct Wine commits so they don't go 
    through the GitLab CI or the TestBot.
* About half of the new failures have no known origin. This can be 
    because they happen on machines I don't have access to and thus 
    where I cannot bisect. It can also be because the failure fails to 
    reproduce which is annoyingly common.
* To keep up 5 to 8 failure would have to be fixed
How do they get in?
Jul    Aug    Sep
  Untested          2      1      7
  Bad               2      0      0
  Module            1      0      1
  Extra configs     0      0      2
  Flaky             2      0      2
  WineTest only     1      0      0
  Bridge            0      0      7
  Misc              1      1      0
  Total             9      2     19
Here's a description of the way the failures got in. Those correspond 
  to deduplicated failure counts since we're not interested in the 
  number of tests impacted but in how the commits got in:
o 33% Untested
    These correspond to direct Wine commits. As such they bypass the 
    GitLab CI and TestBot so there was no way to detect them beforehand.
    There has been a lot of changes in the lowe level code lately so I 
    think this is usually lower.
o 23% Bridge
    These failures were detected by the TestBot but not forwarded 
    to the MR page because the mailing list to GitLab bridge broke on or 
    about 2023-08-28.
o 13% Flaky
    These failures are random and just did not happen when the MR got 
    tested by the GitLab CI and TestBot. One way to minimize the chances 
    of this happening is to run the tests multiple times.
But the TestBot already runs the tests on up to 24 Windows 
    configurations and 10 Linux ones which means random failures that 
    are not configuration-specific have a very good chance of being 
    caught. It's the other ones that get through, like those that only 
    impact a specific locale or a specific Windows version.
o 7% Module
    When a patch modifies a module but not the tests it contains the 
    TestBot skips some test configurations which can lead to some 
    failures being undetected.
o 7% Extra
    This corresponds to failures specific to test configurations that 
    the TestBot only uses for the nightly WineTest runs.
In the above counts I omitted the following:
  o 18 Unknown
    This corresponds to cases where what is causing the failure has not 
    been identified, and thus how they got in undetected is not known. 
    Note that there is one exception: if the failure does not impact the 
    GitLab and TestBot machines then the reason is automatically "Non 
    TestBot".
o 17 Non TestBot
    These new failures impact neither the GitLab nor the TestBot 
    machines. This corresponds to macOS-specific failures for instance. 
    Detecting them beforehand is therefore impossible... except by 
    expanding the set of test machines.
Are they fixed yet?
Jul       Aug        Sep
  Fixed       15 (47%)   2 (33%)   21 (40%)
  Not fixed   17 (53%)   4 (67%)   31 (60%)
Not there yet.
* Take with a big grain of salt : the bugs are not always closed 
    when a failure is fixed. I may also have forgotten to mark some 
    entries as fixed in failures-new.txt but I think that part is mostly 
    okay now.
* Older failures don't seem more likely to be fixed. But then failures 
    are not necessarily reported in the order they appear.
-- 
Francois Gouget fgouget@codeweavers.com