TL;DR; * Windows is bad at hitting precise timing targets. * The 15.6 ms Windows timer granularity can have an impact on the tests reliability. * But some unknown factor is causing large delays which are the source of most timing related failures. * The same delays probably also cause the wait timed out failures. * Avoiding these failures in the timer tests is tricky. -> Find the source of these delays and figure out how to avoid them.
My test subject for this dive was test_tp_timer() in ntdll:threadpool, and more specifically the loop which we expect to run in 500 to 750 ms:
https://source.winehq.org/git/wine.git/blob/HEAD:/dlls/ntdll/tests/threadpoo...
* Recurring timers are nice because one knows exactly how often they should generate an event which makes it easy to see how badly and how often they miss. This is unlike most WaitForSingleObject() calls where one does not know how much time will elapse and all one gets is the unpredictable one-off timeout.
* Wine on Linux is waaay better than Windows at ensuring the timer events arrive precisely on time. Even in a QEmu VM on an otherwise busy machine, every event arrives within 1 ms of the expected time (so in this case, in Wine we go through the loop once every 200 ms and rarely 199 or 201 ms).
* On Windows the default timer resolution is 15.625 ms. So asking for a 200 ms timer interval as in this test is asking for trouble: Windows will round that to 13 * 15.625 ms = 203 ms and then occasionally make the interval shorter so the average stays around 200 ms. So we typically get a pattern of 203 203 203 203 188 ms and again 203...
* Even on real hardware (Windows 8.1 on cw-rx460), Windows can sometimes get into a rut, alternating between 172 ms and 219 ms for instance. So if I tweak the test to go through more loops I'd sometimes get only 4 out of 14 events arriving within 190-210 ms of the previous one! Whether this happens is heavily influenced by the other processes running on the machine, such as playing a video in IE.
* This means that based on just a few events one cannot expect the interval between most events to fall within a narrow range. So here for instance if the acceptable interval is 190-210 ms and the first interval is instead 237 ms, then the next one will necessarily be out of range too, and likely the one after that too. So expecting 2 out of 3 intervals to be within the range is no more reliable than checking just one interval.
* The timer resolution can be changed with NtSetTimerResolution() and increased to 0.5 ms. But that does not change the results: one still gets 203, 203, 203, 203, 188 ms. My understanding is that increasing the timer resolution really only impacts timers that request a higher resolution. But that means testing a different API.
* At least increasing the timer resolution seems to prevent Windows from getting stuck alternating between 172 and 219 ms.
* Interestingly if one starts a Disk Cleanup and then runs "Clean up system files", the analysis of the system files drops the timer interval to 0.5 ms! So trying to see the impact of disk activity can modify the results in unexpected ways.
* One can check the current timer resolution with ClockRes command line tool: https://docs.microsoft.com/en-us/sysinternals/downloads/clockres
* When the delay is greater than the timer interval, e.g. 437 ms, the timer does not try to compensate for the lost events. But it does try to compensate for the fractional interval remainder. So one sees:
Expected Actual Loop Time Time Interval Lateness 1 200 437 437 200+37 2 400 609 172 200+9 3 600 812 203 200+12 4 800 1000 188 200
* In QEmu, when the timer misses it often misses big: 437 ms, 687 ms, even 1469 ms. So most of the time expecting three events to take about 3 intervals does not help with reliability because the timer does not try to compensate the missed events. So at the end it will still be off by one interval (200 ms) or more.
* I could not reproduce these big misses on the Windows 8.1 on cw-rx460 machine (i.e. real hardware).
* Conclusion: - Because Windows constantly adjusts the timer interval so that they are only accurate *on average*, testing the total time for N events will be more reliable than testing the events individually.
- But the occasional big delays mean any timing tolerances below ~1700 ms will be somewhat unreliable :-(
- For timers one could check if a big delay happened in one of the loops and either account for it or reset the timer and retry in the hope no delay will occur then. But that's overly complex.
- The same big delays are probably also the cause of the WaitForSingleObject() failures when the timeout is < 2000 ms.
- The timing in Wine in Debian VMs is very reliable. Yet it is likely that QEmu is somehow responsible for the timing issues in the Windows VMs.
- The best lead at this point is to check whether some VM clock setting is causing this. But some are also need to prevent reverts from taking forever[1]. (w8) <clock offset='localtime'> <timer name='rtc' tickpolicy='delay'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> vs. (w1064) <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> </hyperv> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup' track='guest'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> <timer name='hypervclock' present='yes'/> </clock> vs. (fgtbdebian10) <clock offset='utc'/>
- Otherwise try CPU pinning. But Linux seems to do really well without it...
[1] https://bugs.launchpad.net/qemu/+bug/1505041 https://bugs.launchpad.net/qemu/+bug/1174654