Is the test.winehq.org front page too pessimistic?

List overview All Threads

newer

older

Re: mshtml: Implement IHTMLStyle...

Re: mshtml: Implement...

Juan Lang

11 Feb 2009 11 Feb '09

9:58 p.m.

The front page of test.winehq.org shows statistics about failed tests, but it doesn't seem to take into account the number of individual tests that passed and failed, rather the number of files that had any failures.

So, for example, about a week ago I got a fix committed for some failing mapi32 tests. Looking at the machines with test failures, before the fix was committed, 139 tests were run, with 134 of them failing, whereas after the fix was committed, the same number of tests were run, with only 6 of them failing. Nonetheless, the 4th of February shows a higher failure rate (14.6%) than the 3rd of February (12.4%).

I know other tests could have started failing in the interim, but it seems like we've been putting a fair amount of effort into reducing test failures lately, while the percent of failed tests isn't going down, at least not on the main page. If you look at a particular day's results, the numbers look a bit better over time.

I'm not sending a patch, because there may be different opinions on this. That is, perhaps some people like to see a statistic on the number of files with failing tests on any machine, which the front page appears to show, while others may like to see the number of failures in a particular file, which a day's results show. My own opinion is that it's hard to get motivated to fix something without some sort of positive feedback for it, so changing the front page would be better.

My own feeling is that there are far fewer failing tests now than there used to be, and I'd sure like to see that reflected somewhere at a quick glance. Thoughts? --Juan

Show replies by date

John Klehm

11 Feb 11 Feb

10:04 p.m.

On Wed, Feb 11, 2009 at 3:58 PM, Juan Lang juan.lang@gmail.com wrote:

...

My own feeling is that there are far fewer failing tests now than there used to be, and I'd sure like to see that reflected somewhere at a quick glance. Thoughts?

Maybe a test.winehq.org/trends page showing some nice (hopefully upwards) trend graphs of number of succeeding tests?

--John Klehm

Paul Vriens

12 Feb 12 Feb

2:27 p.m.

John Klehm wrote:

...

On Wed, Feb 11, 2009 at 3:58 PM, Juan Lang juan.lang@gmail.com wrote:

...
My own feeling is that there are far fewer failing tests now than there used to be, and I'd sure like to see that reflected somewhere at a quick glance. Thoughts?

Maybe a test.winehq.org/trends page showing some nice (hopefully upwards) trend graphs of number of succeeding tests?

--John Klehm

I do it for my own boxes (see attachment). The spikes (up and down) are mainly when I didn't run the tests on all my boxes. But you can see the overall trend.

-- Cheers, Paul.

John Klehm

2:35 p.m.

On Thu, Feb 12, 2009 at 8:27 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:

...

I do it for my own boxes (see attachment). The spikes (up and down) are mainly when I didn't run the tests on all my boxes. But you can see the overall trend.

Nice. :) What are you using to generate that?

--John Klehm

Paul Vriens

2:36 p.m.

John Klehm wrote:

...

On Thu, Feb 12, 2009 at 8:27 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:

...
I do it for my own boxes (see attachment). The spikes (up and down) are mainly when I didn't run the tests on all my boxes. But you can see the overall trend.

Nice. :) What are you using to generate that?

--John Klehm

Google Chart API : http://code.google.com/apis/chart/

-- Cheers, Paul.

James Hawkins

11 Feb 11 Feb

10:05 p.m.

On Wed, Feb 11, 2009 at 1:58 PM, Juan Lang juan.lang@gmail.com wrote:

...

The front page of test.winehq.org shows statistics about failed tests, but it doesn't seem to take into account the number of individual tests that passed and failed, rather the number of files that had any failures.

So, for example, about a week ago I got a fix committed for some failing mapi32 tests. Looking at the machines with test failures, before the fix was committed, 139 tests were run, with 134 of them failing, whereas after the fix was committed, the same number of tests were run, with only 6 of them failing. Nonetheless, the 4th of February shows a higher failure rate (14.6%) than the 3rd of February (12.4%).

I know other tests could have started failing in the interim, but it seems like we've been putting a fair amount of effort into reducing test failures lately, while the percent of failed tests isn't going down, at least not on the main page. If you look at a particular day's results, the numbers look a bit better over time.

I'm not sending a patch, because there may be different opinions on this. That is, perhaps some people like to see a statistic on the number of files with failing tests on any machine, which the front page appears to show, while others may like to see the number of failures in a particular file, which a day's results show. My own opinion is that it's hard to get motivated to fix something without some sort of positive feedback for it, so changing the front page would be better.

My own feeling is that there are far fewer failing tests now than there used to be, and I'd sure like to see that reflected somewhere at a quick glance. Thoughts?

We should leave the failing files percentage up (note the name change) and add a failing tests percentage next to it. The failing tests percentage should be total_test_failures / total_tests_run.

-- James Hawkins

Alexandre Julliard

11:27 p.m.

James Hawkins truiken@gmail.com writes:

...

We should leave the failing files percentage up (note the name change) and add a failing tests percentage next to it. The failing tests percentage should be total_test_failures / total_tests_run.

That's not a useful number, many files run a lot of tests, of which a huge majority always succeeds. Having a single failure among 10,000 tests means that the test failed, and it's something bad that should be taken care of. Showing that as a 99.99% success would be very misleading.

-- Alexandre Julliard julliard@winehq.org

James Hawkins

11:34 p.m.

On Wed, Feb 11, 2009 at 3:27 PM, Alexandre Julliard julliard@winehq.org wrote:

...

James Hawkins truiken@gmail.com writes:

...
We should leave the failing files percentage up (note the name change) and add a failing tests percentage next to it. The failing tests percentage should be total_test_failures / total_tests_run.

That's not a useful number, many files run a lot of tests, of which a huge majority always succeeds. Having a single failure among 10,000 tests means that the test failed, and it's something bad that should be taken care of. Showing that as a 99.99% success would be very misleading.

Ok you're right. I wasn't thinking on that scale. I assumed we had more than 0.01% failures.

-- James Hawkins

Scott Ritchie

11:46 p.m.

James Hawkins wrote:

...

On Wed, Feb 11, 2009 at 3:27 PM, Alexandre Julliard julliard@winehq.org wrote:

...
James Hawkins truiken@gmail.com writes:

...
We should leave the failing files percentage up (note the name change) and add a failing tests percentage next to it. The failing tests percentage should be total_test_failures / total_tests_run.

That's not a useful number, many files run a lot of tests, of which a huge majority always succeeds. Having a single failure among 10,000 tests means that the test failed, and it's something bad that should be taken care of. Showing that as a 99.99% success would be very misleading.

Ok you're right. I wasn't thinking on that scale. I assumed we had more than 0.01% failures.

Why not just print absolute numbers for tests failed and passed? Does a percentage even serve any benefit here?

With absolute numbers we could see progress in both tests being fixed and in tests being written.

Thanks, Scott Ritchie

Ben Klein

12 Feb 12 Feb

12:35 a.m.

2009/2/12 James Hawkins truiken@gmail.com:

...

On Wed, Feb 11, 2009 at 3:27 PM, Alexandre Julliard julliard@winehq.org wrote:

...
James Hawkins truiken@gmail.com writes:

...
We should leave the failing files percentage up (note the name change) and add a failing tests percentage next to it. The failing tests percentage should be total_test_failures / total_tests_run.

That's not a useful number, many files run a lot of tests, of which a huge majority always succeeds. Having a single failure among 10,000 tests means that the test failed, and it's something bad that should be taken care of. Showing that as a 99.99% success would be very misleading.

Ok you're right. I wasn't thinking on that scale. I assumed we had more than 0.01% failures.

Just goes to show we're not in the advertising/marketing industry! Facts are good, useful data is good, fudging the figures to make ourselves look good is bad :)

Ricardo Filipe

11 Feb 11 Feb

11:35 p.m.

2009/2/11 Alexandre Julliard julliard@winehq.org

...

That's not a useful number, many files run a lot of tests, of which a huge majority always succeeds. Having a single failure among 10,000 tests means that the test failed, and it's something bad that should be taken care of. Showing that as a 99.99% success would be very misleading.julliard@winehq.org

i think what he means is a value for the whole run, not for each file... this way we get the number of failing files and failing tests for one run. i think that could be useful, at least it can show much more difference and give a better response to fixing tests than a file failure statistic. it would be a more changing rate.

Paul Vriens

12 Feb 12 Feb

7:28 a.m.

Juan Lang wrote:

...

The front page of test.winehq.org shows statistics about failed tests, but it doesn't seem to take into account the number of individual tests that passed and failed, rather the number of files that had any failures.

So, for example, about a week ago I got a fix committed for some failing mapi32 tests. Looking at the machines with test failures, before the fix was committed, 139 tests were run, with 134 of them failing, whereas after the fix was committed, the same number of tests were run, with only 6 of them failing. Nonetheless, the 4th of February shows a higher failure rate (14.6%) than the 3rd of February (12.4%).

I know other tests could have started failing in the interim, but it seems like we've been putting a fair amount of effort into reducing test failures lately, while the percent of failed tests isn't going down, at least not on the main page. If you look at a particular day's results, the numbers look a bit better over time.

I'm not sending a patch, because there may be different opinions on this. That is, perhaps some people like to see a statistic on the number of files with failing tests on any machine, which the front page appears to show, while others may like to see the number of failures in a particular file, which a day's results show. My own opinion is that it's hard to get motivated to fix something without some sort of positive feedback for it, so changing the front page would be better.

My own feeling is that there are far fewer failing tests now than there used to be, and I'd sure like to see that reflected somewhere at a quick glance. Thoughts? --Juan

I don't think that showing individual tests (the actual counts inside dll:name) will help as the error rate will be marginal (as pointed out by AJ).

If you look at the main page you will see a number for Win95 for example. This number shows you how many dll:name tests had a failure on one or more Win95 boxes. This means that 1 box can mess up the platforms stats quite badly. If I have 10 Win95 boxes with no failures and one with all dll:name tests failing, the failure rate for that platform would be equal to the total number of tests (dll:name again).

The cumulative number 'Failures' however is a differently calculated number. It's just an adding of 'overall platform failures' for each platform divided by 'different platforms on that line on the main page' x 'total number of unique dll:name tests'.

So maybe (and it's been discussed in the past) the 'Failures' number should be the number of unique tests (dll:name) that fail on one or more boxes (just like the platform ones but than overall).

That way we get an indication how many dll:name tests need some fixing. It will however won't do any good to our figures. My 6 boxes for example show 5.0% failure on the main page but using this other approach it would have been 14.7%.

So I don't think our numbers are too pessimistic. We are bitten by the fact that more and more people run winetest and that raises the possibility of failing tests of course (non-administrator, different locales, no C-drive .....).

-- Cheers, Paul.

5988

Age (days ago)

5989

Last active (days ago)

wine-devel@winehq.org

11 comments

8 participants

tags (0)

participants (8)

Alexandre Julliard
Ben Klein
James Hawkins
John Klehm
Juan Lang
Paul Vriens
Ricardo Filipe
Scott Ritchie