Juan Lang wrote:
The front page of test.winehq.org shows statistics about failed tests, but it doesn't seem to take into account the number of individual tests that passed and failed, rather the number of files that had any failures.
So, for example, about a week ago I got a fix committed for some failing mapi32 tests. Looking at the machines with test failures, before the fix was committed, 139 tests were run, with 134 of them failing, whereas after the fix was committed, the same number of tests were run, with only 6 of them failing. Nonetheless, the 4th of February shows a higher failure rate (14.6%) than the 3rd of February (12.4%).
I know other tests could have started failing in the interim, but it seems like we've been putting a fair amount of effort into reducing test failures lately, while the percent of failed tests isn't going down, at least not on the main page. If you look at a particular day's results, the numbers look a bit better over time.
I'm not sending a patch, because there may be different opinions on this. That is, perhaps some people like to see a statistic on the number of files with failing tests on any machine, which the front page appears to show, while others may like to see the number of failures in a particular file, which a day's results show. My own opinion is that it's hard to get motivated to fix something without some sort of positive feedback for it, so changing the front page would be better.
My own feeling is that there are far fewer failing tests now than there used to be, and I'd sure like to see that reflected somewhere at a quick glance. Thoughts? --Juan
I don't think that showing individual tests (the actual counts inside dll:name) will help as the error rate will be marginal (as pointed out by AJ).
If you look at the main page you will see a number for Win95 for example. This number shows you how many dll:name tests had a failure on one or more Win95 boxes. This means that 1 box can mess up the platforms stats quite badly. If I have 10 Win95 boxes with no failures and one with all dll:name tests failing, the failure rate for that platform would be equal to the total number of tests (dll:name again).
The cumulative number 'Failures' however is a differently calculated number. It's just an adding of 'overall platform failures' for each platform divided by 'different platforms on that line on the main page' x 'total number of unique dll:name tests'.
So maybe (and it's been discussed in the past) the 'Failures' number should be the number of unique tests (dll:name) that fail on one or more boxes (just like the platform ones but than overall).
That way we get an indication how many dll:name tests need some fixing. It will however won't do any good to our figures. My 6 boxes for example show 5.0% failure on the main page but using this other approach it would have been 14.7%.
So I don't think our numbers are too pessimistic. We are bitten by the fact that more and more people run winetest and that raises the possibility of failing tests of course (non-administrator, different locales, no C-drive .....).