Re: TestBot News - MR & WineTest false positives

19 Oct 2023


      It's been about a year since I started collecting data and also since 
the GitLab CI has been introduced. So here's an update on the merge 
request and nightly Wine test runs false positive rates.
Reminder:
  A false positive (FP) is when the TestBot or GitLab CI says a failure 
  is new when it is not.
* TestBot
The FP rate stayed around 10% until the end of August when the GitLab 
  bridge to the mailing list got broken (see graphs). Looking at it 
  differently, except for June on a given day there was a better than 
  40% chance that less than 10% of the MRs would get a false positive 
  (and >70% chance for < 25%).
But with the bridge gone the TestBot failures are not relayed to the 
  MRs anymore and thus collecting data is impractical and quite 
  irrelevant too.
* GitLab CI
The GitLab CI's FP rate was stayed below 30% until mid May but it has 
  stayed clearly above since then. The 5 week average even reached a 
  peak of 60% in early August and it's not getting really better.
Changing perspective, since March less than 20% of the days had a 
  false positive rate below 10%. And in August and September every 
  single day had more than 10% of false positives.
Also, before August the chances of having an FP rate lower than 25% 
  were much greater, usually 40% or more. But that rate has plummeted 
  and is now below 10%.
The 50% FP line shows great swings which I think are caused by periods 
  where one or more tests has a 100% failure rate and does not get fixed 
  for weeks. Still, in early 2023 it was at 85% or more but since then 
  there has been a clear downward trend where the both the peaks and 
  troughs keep getting lower.
Conclusions:
* I hoped the TestBot FP rate would improve but it has only held steady. 
  It may be that this 10% failure rate is incompressible because of the 
  delay between when a new failure pops up and when the TestBot knows 
  how to identify it (i.e. when I added it to the known failures page: 
  https://testbot.winehq.org/FailuresList.pl).
Stemming the flow of new failures introduced by bad MRs may help lower 
  that rate. But new failures can also happen when a certificate 
  expires, when a test server goes down, or when changing the build 
  platform for instance. So there will likely always be a residual FP 
  rate.
* The GitLab CI seemed to make progress at first but since mid 
  March it has been getting away from the goal of having no false 
  positives.
Notes:
* Comparing the TestBot and GitLab CI failure rates is akin to comparing 
  apples and oranges.
The GitLab CI does a single full test suite (except for a handful of 
  tests) run in Wine (plus a single 64-bit test).
The TestBot does:
  * 1 full 64-bit run in Wine (no exceptions),
  * 1 run of modified tests in a Windows-on-Windows Wine environment,
  * 1 run of all tests of modified modules in Wine,
  * 7 plain 32-bit Wine runs in various locales,
  * 24 tests in various Windows, locale, GPU and screen layout 
    configurations.
And it still gets 1/2 to 1/3 the false positive rate.
* Improving the false positive rate does not mean that the Wine tests 
  have fewer failures. But getting reliable results from the CI was 
  deemed to be a necessary step for developpers to trust it and know 
  they need to rework their MR when the results are bad.
It also means less work for the maintainer to discriminate between MRs 
  that introduce new failures and those that don't. And less chance to 
  make mistakes too.
* Conversely, improving the tests does not necessarily improve the false 
  positive rate. We have 230 failing test units so one can fix 229 of 
  them but if the last one fails systematically the false positive rate 
  will stay pegged at 100%.
Reducing the number of false positives requires either focusing on the 
  tests that cause them, or having counter measures built into the CI... 
  as is the case for the TestBot.
-- 
Francois Gouget fgouget@codeweavers.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: TestBot News - MR & WineTest false positives