Re: A case for policy changes to get to zero test failures

25 Oct 2023


      On Mon, 23 Oct 2023, Zeb Figura wrote:
[...]
...
I do quite like the patterns page. Though at this point, when I take 
time to fix tests, I find what's most helpful is the already filed 
bugs. How much effort do those bugs take to file?
It can be quite time consuming depending on the the volume of new 
failures and whether they are hard to track down. The procedure I go 
through looks something like this:
* First I try to identify a group of related failures.
  Usually that's easy but it can be confusing when there are a lot of 
  non-systematic new failures mixed with lots of pre-existing failures. 
  Also I sometimes don't know enough about the test to know if it will 
  be possible to fix all the failures in one go or if some will require 
  a separate fix. I usually try to err on the side of not mixing things 
  up. Developers should feel free to mark bugs as duplicates when 
  appropriate.
* How to reproduce the bug
  - That can be tricky when the test does not fail on its own because 
    then I have to figure out which other test is interfering (and that 
    can be a dead end).
  - Also when the test does not always fail bisects get more 
    complicated.
* Identify the commit that caused the test to fail.
  - Only doable for the machines I have access to. That makes macOS 
    failures, for instance, easier to deal with since I can just skip 
    this step (and many others).
  - But identifying the commit helps figure out who is most likely to 
    know what's going on and how to fix the issue so I feel it's an 
    important step.
* Identify the date of the first failure.
  - Sometimes it's obvious from the test pattern page.
  - But when the test unit already has lots of failures I grep a mirror 
    of the test.winehq.org reports (sorted by date).
    (I also use the mirror to build myself a patterns page with 8 months 
    of history).
* And then there is the question of identifying which tests need to be 
  looked at:
  - I scan all the TestBot's WineTest job reports (ideally daily and 
    update failures-winetest.txt). The TestBot is now quite 
    good at identifying the new failures so on good days that's fast. On 
    bad days there are a lot of reports to look at.
That's the most efficient way to get a list of new failures, but 
    only for those happening _in the TestBot_.
I usually try to file a bug as soon as possible so I can update the 
    failures page and be sure the TestBot will not report the failure as 
    new again.
Also the TestBot automatically identifies unchanging failure 
    messages and does not report them as new on the following days. That 
    can lead on to think a failure was a one-off when in fact it is 
    happening systematically.
- I also scan the last job of all MRs to identify which failures 
    were present (and update failures-mr.txt in the process): those are 
    the failures that are not considered to be new (otherwise the MR 
    should not have been merged). When it's all green this is obviously 
    fast but otherwise it requires looking at all the logs. If a failure 
    happens only once it may not be worth reporting. But in 
    failures-mr.txt I can see which ones are most common and I try to 
    report those first.
This also allows me to identify failures that only happen in the 
    GitLab CI and not in full WineTest runs.
- And from time to time I just go through the patterns page to 
    identify non-TestBot, non-GitLab CI new failures such as those that 
    happen on Remi's boxes or mine.
Scanning the pattern page takes more time so I don't do it as often.
* That's all for reporting new failures but sometimes failures get fixed 
  without the bug being closed (which is quite understandable: the 
  developer may just not be aware of the bug).
That does not have much of a negative impact on the TestBot so I give 
  closing bug a much lower priority (it can artificially inflate the 
  failure modes number on the patterns page though).
Closing these mostly involves looking at the TestBot's failures page 
  and checking those that have not been matched in a while (or ever).
There's also a more interesting reason to look at those: identify the 
  entries where I got the regexps wrong so the failures may still be 
  reported as new (which I should notice on the failures-winetest.txt 
  but only if the TestBot does not already identify it as old).
* Finally from time to time, less than once a month, go through the 
  failures page to identify the entries that are not needed because the 
  failure has been fixed.
...
Is that sustainable (and considering other tasks on your plate)?
When I have to focus on other things I generally have to stop looking at 
the tests for a while. So it's not totally sustainable.
-- 
Francois Gouget fgouget@codeweavers.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: A case for policy changes to get to zero test failures