Sources of noise on test.winehq.org?

List overview All Threads

newer

older

Re: Sources of noise on...

Status of TaskDialog...

Dan Kegel

9 Apr 2009 9 Apr '09

10:05 a.m.

(This was last discussed in February, e.g. http://www.winehq.org/pipermail/wine-devel/2009-February/073060.html )

The results on test.winehq.org seem more variable than one would expect, which makes it harder to gauge wine's progress.

I can think of two sources of noise: 1) 32 and 64 bit results are mixed together 2) we don't have a stable stable (sic) of machines running the tests

Removing these two sources of noise might be as simple as 1) omit 64 bit results, and 2) omit computers for which results are not consistently available throughout the time range being displayed

Shouldn't be too hard for someone to whip together an alternate report that did that. Wish I had the time to... - Dan

Show replies by date

Paul Vriens

9 Apr 9 Apr

10:41 a.m.

Dan Kegel wrote:

...

(This was last discussed in February, e.g. http://www.winehq.org/pipermail/wine-devel/2009-February/073060.html )

The results on test.winehq.org seem more variable than one would expect, which makes it harder to gauge wine's progress.

I can think of two sources of noise:

32 and 64 bit results are mixed together

we don't have a stable stable (sic) of machines

running the tests

Removing these two sources of noise might be as simple as

omit 64 bit results, and

omit computers for which results are not

consistently available throughout the time range being displayed

Shouldn't be too hard for someone to whip together an alternate report that did that. Wish I had the time to...

Dan

So what constitutes a stable machine? I for one am using VMware for most of my boxes which are clean installs but up-to-date with patches and such. After running winetest they will be reverted to the last correct snapshot (winetest still leaves a lot of rubbish around that could potentially influence a next run).

Side note: One thing I'd also like to see (on first glance) on test.winehq.org is whether we are dealing with a real box or a virtualized one.

Side note 2: Our index per build pages are almost 4MB in size. Splitting things up will also cut down on that.

-- Cheers, Paul.

Dan Kegel

10:48 a.m.

On Thu, Apr 9, 2009 at 3:41 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:

...

...

we don't have a stable stable (sic) of machines

running the tests

So what constitutes a stable machine?

I wasn't complaining about unstable machines; I was complaining that the set of machines reporting test results varies. - Dan

Paul Vriens

10:54 a.m.

Dan Kegel wrote:

...

On Thu, Apr 9, 2009 at 3:41 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:

...
...

we don't have a stable stable (sic) of machines

running the tests

So what constitutes a stable machine?

I wasn't complaining about unstable machines; I was complaining that the set of machines reporting test results varies.

Dan

Ok, got it. So how do we come up with a stable set?

Or do you just like to have some kind of script that fetches the reports for a fixed set of boxes (configurable) and generate pages like test.winehq.org?

-- Cheers, Paul.

Dan Kegel

11:03 a.m.

On Thu, Apr 9, 2009 at 3:54 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:

...

...
I wasn't complaining about unstable machines; I was complaining that the set of machines reporting test results varies.

Ok, got it. So how do we come up with a stable set?

My suggestion for that was:

2) omit computers for which results are not consistently available throughout the time range being displayed

...

Or do you just like to have some kind of script that fetches the reports for a fixed set of boxes (configurable) and generate pages like test.winehq.org?

Quick and dirty way would be to take the same script that generates test.winehq.org, change it a bit to omit 64 bit results and computers that didn't report every time, and save its results in test.winehq.org/data/index-filtered.html or something. - Dan

Paul Vriens

11:04 a.m.

Paul Vriens wrote:

...

Dan Kegel wrote:

...
On Thu, Apr 9, 2009 at 3:41 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:

...
...

we don't have a stable stable (sic) of machines

running the tests

So what constitutes a stable machine?

I wasn't complaining about unstable machines; I was complaining that the set of machines reporting test results varies.

Dan

Ok, got it. So how do we come up with a stable set?

Sorry, just read your message again and you mentioned 'consistently available'. That in itself shouldn't be to hard to code, but people do have holidays ;)

-- Cheers, Paul.

Rob Shearman

12:09 p.m.

2009/4/9 Dan Kegel dank@kegel.com:

...

(This was last discussed in February, e.g. http://www.winehq.org/pipermail/wine-devel/2009-February/073060.html )

The results on test.winehq.org seem more variable than one would expect, which makes it harder to gauge wine's progress.

I can think of two sources of noise:

32 and 64 bit results are mixed together

we don't have a stable stable (sic) of machines

running the tests

Removing these two sources of noise might be as simple as

omit 64 bit results, and

Not a bad idea, but I would suggest classifying these machines as a different category (I would suggest "other" for the moment) rather than omitting the results.

...

omit computers for which results are not

consistently available throughout the time range being displayed

Shouldn't be too hard for someone to whip together an alternate report that did that. Wish I had the time to...

-- Rob Shearman

Dan Kegel

4:56 p.m.

On Thu, Apr 9, 2009 at 5:09 AM, Rob Shearman robertshearman@gmail.com wrote:

...

...
Removing these two sources of noise might be as simple as

omit 64 bit results, and

Not a bad idea, but I would suggest classifying these machines as a different category (I would suggest "other" for the moment) rather than omitting the results.

Or just a separate output file, if we're in quick and dirty mode.

Alexandre Julliard

5:17 p.m.

Rob Shearman robertshearman@gmail.com writes:

...

2009/4/9 Dan Kegel dank@kegel.com:

...
(This was last discussed in February, e.g. http://www.winehq.org/pipermail/wine-devel/2009-February/073060.html )

The results on test.winehq.org seem more variable than one would expect, which makes it harder to gauge wine's progress.

I can think of two sources of noise:

32 and 64 bit results are mixed together

we don't have a stable stable (sic) of machines

running the tests

Removing these two sources of noise might be as simple as

omit 64 bit results, and

Not a bad idea, but I would suggest classifying these machines as a different category (I would suggest "other" for the moment) rather than omitting the results.

Having the 64-bit results in the same platform group as the 32-bit ones is actually very helpful, it makes comparing them a lot easier. I don't think we want to split them.

-- Alexandre Julliard julliard@winehq.org

Dan Kegel

5:40 p.m.

On Thu, Apr 9, 2009 at 10:17 AM, Alexandre Julliard julliard@winehq.org wrote:

...

Having the 64-bit results in the same platform group as the 32-bit ones is actually very helpful, it makes comparing them a lot easier. I don't think we want to split them.

But suppressing or splitting the results from machines that don't have a full set of results shouldn't hurt...

Paul Vriens

5:50 p.m.

Dan Kegel wrote:

...

On Thu, Apr 9, 2009 at 10:17 AM, Alexandre Julliard julliard@winehq.org wrote:

...
Having the 64-bit results in the same platform group as the 32-bit ones is actually very helpful, it makes comparing them a lot easier. I don't think we want to split them.

But suppressing or splitting the results from machines that don't have a full set of results shouldn't hurt...

How many runs should have a box have before it's considered for the 'stable' list? We have 2 months worth of data, which roughly means 40 runs.

-- Cheers, Paul.

Dan Kegel

5:56 p.m.

On Thu, Apr 9, 2009 at 10:50 AM, Paul Vriens paul.vriens.wine@gmail.com wrote:

...

...
But suppressing or splitting the results from machines that don't have a full set of results shouldn't hurt...

How many runs should have a box have before it's considered for the 'stable' list? We have 2 months worth of data, which roughly means 40 runs.

Easy: if it misses >= N reports during the period being displayed, skip it. N=1 would be strict. We'd want to pick the lowest N that left enough data, I suspect.

Alexandre Julliard

7:12 p.m.

Dan Kegel dank@kegel.com writes:

...

On Thu, Apr 9, 2009 at 10:17 AM, Alexandre Julliard julliard@winehq.org wrote:

...
Having the 64-bit results in the same platform group as the 32-bit ones is actually very helpful, it makes comparing them a lot easier. I don't think we want to split them.

But suppressing or splitting the results from machines that don't have a full set of results shouldn't hurt...

I don't see the point. If you want statistics for a specific machine, sure, we could generate a page for that. But the global index should show all the results we have. If it shows too many failures it means the tests are not robust enough across machines and should be fixed, not ignored.

-- Alexandre Julliard julliard@winehq.org

5928

Age (days ago)

5928

Last active (days ago)

wine-devel@winehq.org

12 comments

4 participants

tags (0)

participants (4)

Alexandre Julliard
Dan Kegel
Paul Vriens
Rob Shearman