(This was last discussed in February, e.g. http://www.winehq.org/pipermail/wine-devel/2009-February/073060.html )
The results on test.winehq.org seem more variable than one would expect, which makes it harder to gauge wine's progress.
I can think of two sources of noise: 1) 32 and 64 bit results are mixed together 2) we don't have a stable stable (sic) of machines running the tests
Removing these two sources of noise might be as simple as 1) omit 64 bit results, and 2) omit computers for which results are not consistently available throughout the time range being displayed
Shouldn't be too hard for someone to whip together an alternate report that did that. Wish I had the time to... - Dan
Dan Kegel wrote:
(This was last discussed in February, e.g. http://www.winehq.org/pipermail/wine-devel/2009-February/073060.html )
The results on test.winehq.org seem more variable than one would expect, which makes it harder to gauge wine's progress.
I can think of two sources of noise:
- 32 and 64 bit results are mixed together
- we don't have a stable stable (sic) of machines
running the tests
Removing these two sources of noise might be as simple as
- omit 64 bit results, and
- omit computers for which results are not
consistently available throughout the time range being displayed
Shouldn't be too hard for someone to whip together an alternate report that did that. Wish I had the time to...
- Dan
So what constitutes a stable machine? I for one am using VMware for most of my boxes which are clean installs but up-to-date with patches and such. After running winetest they will be reverted to the last correct snapshot (winetest still leaves a lot of rubbish around that could potentially influence a next run).
Side note: One thing I'd also like to see (on first glance) on test.winehq.org is whether we are dealing with a real box or a virtualized one.
Side note 2: Our index per build pages are almost 4MB in size. Splitting things up will also cut down on that.
On Thu, Apr 9, 2009 at 3:41 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:
- we don't have a stable stable (sic) of machines
running the tests
So what constitutes a stable machine?
I wasn't complaining about unstable machines; I was complaining that the set of machines reporting test results varies. - Dan
Dan Kegel wrote:
On Thu, Apr 9, 2009 at 3:41 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:
- we don't have a stable stable (sic) of machines
running the tests
So what constitutes a stable machine?
I wasn't complaining about unstable machines; I was complaining that the set of machines reporting test results varies.
- Dan
Ok, got it. So how do we come up with a stable set?
Or do you just like to have some kind of script that fetches the reports for a fixed set of boxes (configurable) and generate pages like test.winehq.org?
On Thu, Apr 9, 2009 at 3:54 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:
I wasn't complaining about unstable machines; I was complaining that the set of machines reporting test results varies.
Ok, got it. So how do we come up with a stable set?
My suggestion for that was:
2) omit computers for which results are not consistently available throughout the time range being displayed
Or do you just like to have some kind of script that fetches the reports for a fixed set of boxes (configurable) and generate pages like test.winehq.org?
Quick and dirty way would be to take the same script that generates test.winehq.org, change it a bit to omit 64 bit results and computers that didn't report every time, and save its results in test.winehq.org/data/index-filtered.html or something. - Dan
Paul Vriens wrote:
Dan Kegel wrote:
On Thu, Apr 9, 2009 at 3:41 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:
- we don't have a stable stable (sic) of machines
running the tests
So what constitutes a stable machine?
I wasn't complaining about unstable machines; I was complaining that the set of machines reporting test results varies.
- Dan
Ok, got it. So how do we come up with a stable set?
Sorry, just read your message again and you mentioned 'consistently available'. That in itself shouldn't be to hard to code, but people do have holidays ;)
2009/4/9 Dan Kegel dank@kegel.com:
(This was last discussed in February, e.g. http://www.winehq.org/pipermail/wine-devel/2009-February/073060.html )
The results on test.winehq.org seem more variable than one would expect, which makes it harder to gauge wine's progress.
I can think of two sources of noise:
- 32 and 64 bit results are mixed together
- we don't have a stable stable (sic) of machines
running the tests
Removing these two sources of noise might be as simple as
- omit 64 bit results, and
Not a bad idea, but I would suggest classifying these machines as a different category (I would suggest "other" for the moment) rather than omitting the results.
- omit computers for which results are not
consistently available throughout the time range being displayed
Shouldn't be too hard for someone to whip together an alternate report that did that. Wish I had the time to...
On Thu, Apr 9, 2009 at 5:09 AM, Rob Shearman robertshearman@gmail.com wrote:
Removing these two sources of noise might be as simple as
- omit 64 bit results, and
Not a bad idea, but I would suggest classifying these machines as a different category (I would suggest "other" for the moment) rather than omitting the results.
Or just a separate output file, if we're in quick and dirty mode.
Rob Shearman robertshearman@gmail.com writes:
2009/4/9 Dan Kegel dank@kegel.com:
(This was last discussed in February, e.g. http://www.winehq.org/pipermail/wine-devel/2009-February/073060.html )
The results on test.winehq.org seem more variable than one would expect, which makes it harder to gauge wine's progress.
I can think of two sources of noise:
- 32 and 64 bit results are mixed together
- we don't have a stable stable (sic) of machines
running the tests
Removing these two sources of noise might be as simple as
- omit 64 bit results, and
Not a bad idea, but I would suggest classifying these machines as a different category (I would suggest "other" for the moment) rather than omitting the results.
Having the 64-bit results in the same platform group as the 32-bit ones is actually very helpful, it makes comparing them a lot easier. I don't think we want to split them.
On Thu, Apr 9, 2009 at 10:17 AM, Alexandre Julliard julliard@winehq.org wrote:
Having the 64-bit results in the same platform group as the 32-bit ones is actually very helpful, it makes comparing them a lot easier. I don't think we want to split them.
But suppressing or splitting the results from machines that don't have a full set of results shouldn't hurt...
Dan Kegel wrote:
On Thu, Apr 9, 2009 at 10:17 AM, Alexandre Julliard julliard@winehq.org wrote:
Having the 64-bit results in the same platform group as the 32-bit ones is actually very helpful, it makes comparing them a lot easier. I don't think we want to split them.
But suppressing or splitting the results from machines that don't have a full set of results shouldn't hurt...
How many runs should have a box have before it's considered for the 'stable' list? We have 2 months worth of data, which roughly means 40 runs.
On Thu, Apr 9, 2009 at 10:50 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:
But suppressing or splitting the results from machines that don't have a full set of results shouldn't hurt...
How many runs should have a box have before it's considered for the 'stable' list? We have 2 months worth of data, which roughly means 40 runs.
Easy: if it misses >= N reports during the period being displayed, skip it. N=1 would be strict. We'd want to pick the lowest N that left enough data, I suspect.
Dan Kegel dank@kegel.com writes:
On Thu, Apr 9, 2009 at 10:17 AM, Alexandre Julliard julliard@winehq.org wrote:
Having the 64-bit results in the same platform group as the 32-bit ones is actually very helpful, it makes comparing them a lot easier. I don't think we want to split them.
But suppressing or splitting the results from machines that don't have a full set of results shouldn't hurt...
I don't see the point. If you want statistics for a specific machine, sure, we could generate a page for that. But the global index should show all the results we have. If it shows too many failures it means the tests are not robust enough across machines and should be fixed, not ignored.