https://bugs.winehq.org/show_bug.cgi?id=55786
Bug ID: 55786 Summary: user32:msg - test_timers() sometimes gets a shorter timeout than expected in Wine Product: Wine Version: unspecified Hardware: x86-64 OS: Linux Status: NEW Severity: normal Priority: P2 Component: user32 Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com Distribution: ---
user32:msg - test_timers() sometimes gets a shorter timeout than expected in Wine:
11622: Test failed: did not get expected count for minimum timeout (90 != ~100).
See https://test.winehq.org/data/patterns.html#user32:msg
This first happened on 2023-07-14 (no instance in the previous 5 months) and has happened about 8 times per month since. It does not happen on any of the TestBot VMs however. So far it only happened on the GitLab CI (impacting 12 MRs), on Remi's macOS machines and once on my fg-deb64 desktop.
Here are the October instance: * 2023-10-04 linux_gitlab-debian-64 * 2023-10-06 mac_rbernon-macos-opengl-win64_1 * 2023-10-10 mac_rbernon-macos-opengl-wow64 * 2023-10-10 MR!4024, MR!4042 * 2023-10-11 MR!4064, MR!4075 * 2023-10-12 MR!4071
The lowest value in the failure messages (including the MRs) is 86 (2023-07-14 mac_rbernon-macos-opengl-win64_2). All other values are either 89 or 90. So TIMER_COUNT_TOLERANCE = 10 was not far off, but maybe something around 15.6 would be more accurate.
That said this test is old (2013-2019) so it is strange that it only started failing last July.
https://bugs.winehq.org/show_bug.cgi?id=55786
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |madewokherd@gmail.com, | |rbernon@codeweavers.com Keywords| |source, testcase
https://bugs.winehq.org/show_bug.cgi?id=55786
--- Comment #1 from Esme Povirk madewokherd@gmail.com --- This is set up in a very confusing way IMO: It waits for 1000 ms, rounded up to the next "GetTickCount interval", and counts the number of times WM_TIMER is posted. Which, apparently, is expected to be 100 for a 10 ms timer. So in order to figure out the difference *in interval* we have to divide 1000 ms by the count, which (for 86), gets us an interval of ~11.6 ms. So I guess this has to do with high system load, where it's not able to handle the timeouts fast enough.
I'm trying to figure out if there's a way to measure that doesn't have this problem. There probably isn't.
https://bugs.winehq.org/show_bug.cgi?id=55786
--- Comment #2 from Esme Povirk madewokherd@gmail.com --- I have no idea what the 9 counts = 4 ms calculation in the comments means.
https://bugs.winehq.org/show_bug.cgi?id=55786
--- Comment #3 from Esme Povirk madewokherd@gmail.com --- Maybe we should measure the minimum rather than average, and use a more precise function than GetTickCount to do it? My guess would be that each interval isn't 1-2 ms off but there's just a 100 ms pause in there somewhere.
https://bugs.winehq.org/show_bug.cgi?id=55786
--- Comment #4 from Esme Povirk madewokherd@gmail.com --- Tried basing it on the median difference (as measured by QueryPerformanceCounter) but I'm getting some weird values: https://testbot.winehq.org/JobDetails.pl?Key=138988
https://bugs.winehq.org/show_bug.cgi?id=55786
--- Comment #5 from Esme Povirk madewokherd@gmail.com --- It seems like QPC median is actually worse.
https://bugs.winehq.org/show_bug.cgi?id=55786
--- Comment #6 from Esme Povirk madewokherd@gmail.com --- I changed it from the median of 10 to 100 values, and that seems to be more accurate. Since I'm calculating deltas, those give me a sample size of 9 and 99 respectively.
Interestingly, the current value/threshold on Windows accepts any duration from 13.5 ms to 30.3 ms, because the "64" and "43" values +/- 10 overlap. I think this is a clear indication that the current way of measuring makes no sense. It explains why I'm getting more failures this way, because I'm actually looking for a much narrower range.
The lowest I've observed so far on testbot is 13.534. Most results cluster around 15.625 ms. When I was using 9 samples, the minimum was 12.896 and the maximum was 20.008. I haven't been able to reproduce the results I got close to 20 ms since increasing the sample count. This makes me wonder if 20 ms is an actual value used on Windows, or if 99 samples isn't enough. Tempted to bump it to 499 (which will take about 8 seconds for each of these tests on Windows).
On Wine, a delay ranging from 9.17 ms to 10.98 is expected by the original test, because it uses < instead of <= and so is actually +/- 9.
The "4 ms" mentioned in the comments still makes no sense to me.
https://bugs.winehq.org/show_bug.cgi?id=55786
--- Comment #7 from Esme Povirk madewokherd@gmail.com --- Even at 499 samples, this dips uncomfortably close to 9.0 ms on Wine.
https://bugs.winehq.org/show_bug.cgi?id=55786
--- Comment #8 from Esme Povirk madewokherd@gmail.com --- None of the Windows test machines seem to use 10 ms timers, except for win7 which uses them *only for HWND timers, not thread timers*. It's really odd to me that they'd be different, and consistently so.
In one Wine test run I got 9.077 ms, 9.034 ms, and 9.045 ms, when we know Wine is supposed to be using 10 ms timers, so that makes me think the lag is spread evenly and can't be filtered out of the data.
https://bugs.winehq.org/show_bug.cgi?id=55786
--- Comment #9 from Esme Povirk madewokherd@gmail.com --- It's also odd that with QPC I *consistently* get durations less than 10 ms, even though Wine is aiming for 10, and even on Windows 7. One would expect that, because it takes time to dispatch events, durations should be more than 10 ms.
https://bugs.winehq.org/show_bug.cgi?id=55786
--- Comment #10 from Esme Povirk madewokherd@gmail.com --- Turns out I forgot to update the index of the median, so I was actually getting something very close to the minimum when I increased the sample count. This makes much more sense of my results and gives me results that are (mostly) closer to what I'd expect. In particular, I'm getting timings both greater and less than 10 ms on Wine, instead of consistently less.
I've *also* started getting wonky values again from the Windows VM's. The fact that this didn't happen when using minimums suggests to me that 15 ms is generally accurate on Windows machines, and the values closer to 20 ms were errors.
I think I'll see if increasing sample size helps on the Windows VM's, and if not I'll just stick with a large range like we already had.