https://bugs.winehq.org/show_bug.cgi?id=55786
--- Comment #6 from Esme Povirk madewokherd@gmail.com --- I changed it from the median of 10 to 100 values, and that seems to be more accurate. Since I'm calculating deltas, those give me a sample size of 9 and 99 respectively.
Interestingly, the current value/threshold on Windows accepts any duration from 13.5 ms to 30.3 ms, because the "64" and "43" values +/- 10 overlap. I think this is a clear indication that the current way of measuring makes no sense. It explains why I'm getting more failures this way, because I'm actually looking for a much narrower range.
The lowest I've observed so far on testbot is 13.534. Most results cluster around 15.625 ms. When I was using 9 samples, the minimum was 12.896 and the maximum was 20.008. I haven't been able to reproduce the results I got close to 20 ms since increasing the sample count. This makes me wonder if 20 ms is an actual value used on Windows, or if 99 samples isn't enough. Tempted to bump it to 499 (which will take about 8 seconds for each of these tests on Windows).
On Wine, a delay ranging from 9.17 ms to 10.98 is expected by the original test, because it uses < instead of <= and so is actually +/- 9.
The "4 ms" mentioned in the comments still makes no sense to me.