The front page of test.winehq.org shows statistics about failed tests, but it doesn't seem to take into account the number of individual tests that passed and failed, rather the number of files that had any failures.
So, for example, about a week ago I got a fix committed for some failing mapi32 tests. Looking at the machines with test failures, before the fix was committed, 139 tests were run, with 134 of them failing, whereas after the fix was committed, the same number of tests were run, with only 6 of them failing. Nonetheless, the 4th of February shows a higher failure rate (14.6%) than the 3rd of February (12.4%).
I know other tests could have started failing in the interim, but it seems like we've been putting a fair amount of effort into reducing test failures lately, while the percent of failed tests isn't going down, at least not on the main page. If you look at a particular day's results, the numbers look a bit better over time.
I'm not sending a patch, because there may be different opinions on this. That is, perhaps some people like to see a statistic on the number of files with failing tests on any machine, which the front page appears to show, while others may like to see the number of failures in a particular file, which a day's results show. My own opinion is that it's hard to get motivated to fix something without some sort of positive feedback for it, so changing the front page would be better.
My own feeling is that there are far fewer failing tests now than there used to be, and I'd sure like to see that reflected somewhere at a quick glance. Thoughts? --Juan
On Wed, Feb 11, 2009 at 3:58 PM, Juan Lang juan.lang@gmail.com wrote:
My own feeling is that there are far fewer failing tests now than there used to be, and I'd sure like to see that reflected somewhere at a quick glance. Thoughts?
Maybe a test.winehq.org/trends page showing some nice (hopefully upwards) trend graphs of number of succeeding tests?
--John Klehm
John Klehm wrote:
On Wed, Feb 11, 2009 at 3:58 PM, Juan Lang juan.lang@gmail.com wrote:
My own feeling is that there are far fewer failing tests now than there used to be, and I'd sure like to see that reflected somewhere at a quick glance. Thoughts?
Maybe a test.winehq.org/trends page showing some nice (hopefully upwards) trend graphs of number of succeeding tests?
--John Klehm
I do it for my own boxes (see attachment). The spikes (up and down) are mainly when I didn't run the tests on all my boxes. But you can see the overall trend.
On Thu, Feb 12, 2009 at 8:27 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:
I do it for my own boxes (see attachment). The spikes (up and down) are mainly when I didn't run the tests on all my boxes. But you can see the overall trend.
Nice. :) What are you using to generate that?
--John Klehm
John Klehm wrote:
On Thu, Feb 12, 2009 at 8:27 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:
I do it for my own boxes (see attachment). The spikes (up and down) are mainly when I didn't run the tests on all my boxes. But you can see the overall trend.
Nice. :) What are you using to generate that?
--John Klehm
Google Chart API : http://code.google.com/apis/chart/
On Wed, Feb 11, 2009 at 1:58 PM, Juan Lang juan.lang@gmail.com wrote:
The front page of test.winehq.org shows statistics about failed tests, but it doesn't seem to take into account the number of individual tests that passed and failed, rather the number of files that had any failures.
So, for example, about a week ago I got a fix committed for some failing mapi32 tests. Looking at the machines with test failures, before the fix was committed, 139 tests were run, with 134 of them failing, whereas after the fix was committed, the same number of tests were run, with only 6 of them failing. Nonetheless, the 4th of February shows a higher failure rate (14.6%) than the 3rd of February (12.4%).
I know other tests could have started failing in the interim, but it seems like we've been putting a fair amount of effort into reducing test failures lately, while the percent of failed tests isn't going down, at least not on the main page. If you look at a particular day's results, the numbers look a bit better over time.
I'm not sending a patch, because there may be different opinions on this. That is, perhaps some people like to see a statistic on the number of files with failing tests on any machine, which the front page appears to show, while others may like to see the number of failures in a particular file, which a day's results show. My own opinion is that it's hard to get motivated to fix something without some sort of positive feedback for it, so changing the front page would be better.
My own feeling is that there are far fewer failing tests now than there used to be, and I'd sure like to see that reflected somewhere at a quick glance. Thoughts?
We should leave the failing files percentage up (note the name change) and add a failing tests percentage next to it. The failing tests percentage should be total_test_failures / total_tests_run.
James Hawkins truiken@gmail.com writes:
We should leave the failing files percentage up (note the name change) and add a failing tests percentage next to it. The failing tests percentage should be total_test_failures / total_tests_run.
That's not a useful number, many files run a lot of tests, of which a huge majority always succeeds. Having a single failure among 10,000 tests means that the test failed, and it's something bad that should be taken care of. Showing that as a 99.99% success would be very misleading.
On Wed, Feb 11, 2009 at 3:27 PM, Alexandre Julliard julliard@winehq.org wrote:
James Hawkins truiken@gmail.com writes:
We should leave the failing files percentage up (note the name change) and add a failing tests percentage next to it. The failing tests percentage should be total_test_failures / total_tests_run.
That's not a useful number, many files run a lot of tests, of which a huge majority always succeeds. Having a single failure among 10,000 tests means that the test failed, and it's something bad that should be taken care of. Showing that as a 99.99% success would be very misleading.
Ok you're right. I wasn't thinking on that scale. I assumed we had more than 0.01% failures.
James Hawkins wrote:
On Wed, Feb 11, 2009 at 3:27 PM, Alexandre Julliard julliard@winehq.org wrote:
James Hawkins truiken@gmail.com writes:
We should leave the failing files percentage up (note the name change) and add a failing tests percentage next to it. The failing tests percentage should be total_test_failures / total_tests_run.
That's not a useful number, many files run a lot of tests, of which a huge majority always succeeds. Having a single failure among 10,000 tests means that the test failed, and it's something bad that should be taken care of. Showing that as a 99.99% success would be very misleading.
Ok you're right. I wasn't thinking on that scale. I assumed we had more than 0.01% failures.
Why not just print absolute numbers for tests failed and passed? Does a percentage even serve any benefit here?
With absolute numbers we could see progress in both tests being fixed and in tests being written.
Thanks, Scott Ritchie
2009/2/12 James Hawkins truiken@gmail.com:
On Wed, Feb 11, 2009 at 3:27 PM, Alexandre Julliard julliard@winehq.org wrote:
James Hawkins truiken@gmail.com writes:
We should leave the failing files percentage up (note the name change) and add a failing tests percentage next to it. The failing tests percentage should be total_test_failures / total_tests_run.
That's not a useful number, many files run a lot of tests, of which a huge majority always succeeds. Having a single failure among 10,000 tests means that the test failed, and it's something bad that should be taken care of. Showing that as a 99.99% success would be very misleading.
Ok you're right. I wasn't thinking on that scale. I assumed we had more than 0.01% failures.
Just goes to show we're not in the advertising/marketing industry! Facts are good, useful data is good, fudging the figures to make ourselves look good is bad :)
2009/2/11 Alexandre Julliard julliard@winehq.org
That's not a useful number, many files run a lot of tests, of which a huge majority always succeeds. Having a single failure among 10,000 tests means that the test failed, and it's something bad that should be taken care of. Showing that as a 99.99% success would be very misleading.julliard@winehq.org
i think what he means is a value for the whole run, not for each file... this way we get the number of failing files and failing tests for one run. i think that could be useful, at least it can show much more difference and give a better response to fixing tests than a file failure statistic. it would be a more changing rate.
Juan Lang wrote:
The front page of test.winehq.org shows statistics about failed tests, but it doesn't seem to take into account the number of individual tests that passed and failed, rather the number of files that had any failures.
So, for example, about a week ago I got a fix committed for some failing mapi32 tests. Looking at the machines with test failures, before the fix was committed, 139 tests were run, with 134 of them failing, whereas after the fix was committed, the same number of tests were run, with only 6 of them failing. Nonetheless, the 4th of February shows a higher failure rate (14.6%) than the 3rd of February (12.4%).
I know other tests could have started failing in the interim, but it seems like we've been putting a fair amount of effort into reducing test failures lately, while the percent of failed tests isn't going down, at least not on the main page. If you look at a particular day's results, the numbers look a bit better over time.
I'm not sending a patch, because there may be different opinions on this. That is, perhaps some people like to see a statistic on the number of files with failing tests on any machine, which the front page appears to show, while others may like to see the number of failures in a particular file, which a day's results show. My own opinion is that it's hard to get motivated to fix something without some sort of positive feedback for it, so changing the front page would be better.
My own feeling is that there are far fewer failing tests now than there used to be, and I'd sure like to see that reflected somewhere at a quick glance. Thoughts? --Juan
I don't think that showing individual tests (the actual counts inside dll:name) will help as the error rate will be marginal (as pointed out by AJ).
If you look at the main page you will see a number for Win95 for example. This number shows you how many dll:name tests had a failure on one or more Win95 boxes. This means that 1 box can mess up the platforms stats quite badly. If I have 10 Win95 boxes with no failures and one with all dll:name tests failing, the failure rate for that platform would be equal to the total number of tests (dll:name again).
The cumulative number 'Failures' however is a differently calculated number. It's just an adding of 'overall platform failures' for each platform divided by 'different platforms on that line on the main page' x 'total number of unique dll:name tests'.
So maybe (and it's been discussed in the past) the 'Failures' number should be the number of unique tests (dll:name) that fail on one or more boxes (just like the platform ones but than overall).
That way we get an indication how many dll:name tests need some fixing. It will however won't do any good to our figures. My 6 boxes for example show 5.0% failure on the main page but using this other approach it would have been 14.7%.
So I don't think our numbers are too pessimistic. We are bitten by the fact that more and more people run winetest and that raises the possibility of failing tests of course (non-administrator, different locales, no C-drive .....).