Re: TestBot: A dive into the Windows timers

25 Mar 2020

      On Tue, 24 Mar 2020, Zebediah Figura wrote:
[...]
...
...

This means that based on just a few events one cannot expect the
interval between most events to fall within a narrow range. So here 
for instance if the acceptable interval is 190-210 ms and the first 
interval is instead 237 ms, then the next one will necessarily be out 
of range too, and likely the one after that too. So expecting 2 out 
of 3 intervals to be within the range is no more reliable than 
checking just one interval.

Allowing for more error than 10ms seems reasonable to me, even by an
order of magnitude.
The test tolerances are not that tight, as far as I know, and certainly 
not for this threadpool timer test. That was just me testing an 
alternative approach and finding it to not be viable. As I said in this 
specific case the allowed range is 500-750 for an expected 600 ms (3*200 
ms).
But there are cases in other tests where we do a TerminateProcess() or 
similar and expect the WaitForSingleObject() to return within 100 ms. I 
don't think those are correct. Even 1s feels short. The recent 
kernel32:process helper functions replaced a bunch of them with 
wait_child_process() calls so now the timeout is 30s. I may align the 
remaining timeouts with that... though I feel 30s is a bit large. Surely 
10s should be enough?
[...]
...
...

In QEmu, when the timer misses it often misses big: 437 ms, 687 ms,
even 1469 ms. So most of the time expecting three events to take about 
3 intervals does not help with reliability because the timer does not 
try to compensate the missed events. So at the end it will still be 
off by one interval (200 ms) or more.

I could not reproduce these big misses on the Windows 8.1 on 
cw-rx460 machine (i.e. real hardware).

This is the real problem, I guess. I mean, the operating system makes no
guarantees about timers firing on time, of course, but when we try to
wait for events to happen and they're frequently late by over a second,
that makes things very difficult to test.
Is it possible the CPU is under heavy load?
Not really, no. There's really not much running on the VM hosts:
* VMs
  We run at most one VM at a time per host, precisely to make sure the 
  activity in one VM does not interfere with the tests running in the 
  other VM(s). Of course it make the TestBot pretty inefficient and it 
  also does not prevent these delays :-(
* Unattended upgrades
  Once a day apt will check for security updates and install them. But 
  on Debian stable that should not amount to much.
* Acts of administrator
  Mostly VM backup/restore, debugging, reconfiguring. But these are too 
  infrequent to explain all the delays we get.
Also I'm not convinced CPU load on the host is the cause of these 
delays.
-- 
Francois Gouget fgouget@codeweavers.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: TestBot: A dive into the Windows timers