On 7/4/20 12:33 AM, Rémi Bernon wrote:
On 2020-07-03 17:05, Rémi Bernon wrote:
On 2020-07-03 15:48, Rémi Bernon wrote:
On 2020-07-03 15:32, Francois Gouget wrote:
On Fri, 3 Jul 2020, Rémi Bernon wrote: [...]
Ah indeed, I thought I could add more arguments to run them all.
The monitor test causes a Xephyr segmentation fault locally, but the msg test seems to run fine. I wonder if it could be snowballing from the monitor test.
I don't see X error messages in the "full task log". But if X segfaulted directly that would probably not be visible there.
I never tested it but if X segfaults I suspect we'd get back to the chooser which should reconnect the default user after the standard timeout (<30 seconds) and the tests are likely to continue running in the meantime. It's possible they could be stuck while X is restarting which could explain the string of timeouts.
It may be possible to test for this condition by starting an independent windows process displaying a window in an early test (CreateProcess("clock.exe")?) and checking the final screenshot to see if that window is still there.
I cancelled the remaining tests as it was taking way too long to timeout. I'm also investigating the monitor test, as the other tests run fine separately (at least msg, as shown here: https://testbot.winehq.org/JobDetails.pl?Key=74749). My local Xephyr crash seems actually unrelated (sadly).
It looks like that applying the mode changes by calling
ChangeDisplaySettingsExA(NULL ...)
sometimes takes an unusually long time, up to ~100s for a single mode change for instance, as shown here:
https://testbot.winehq.org/JobDetails.pl?Key=74774&f101=win32.report#k10...
With all the modes being iterated, it's no wonder that it ends up timing out. I think that it may be possible that the timeout then causes additional issues, if the process is killed during the mode change?
So it all ends up being this call in xrandr12_set_current_mode:
status = pXRRSetCrtcConfig( gdi_display, resources, resources->crtcs[primary_crtc], CurrentTime, crtc_info->x, crtc_info->y, xrandr12_modes[mode], crtc_info->rotation, crtc_info->outputs, crtc_info->noutput );
that sometimes take ~100s. I can see it consistently in the Win32 tests, when it restores the modes (although it's already in the correct mode), but I disabled the individual mode tests to reduce the test timeouts, so it may also happen in the other archs with the whole test.
I'm not sure what to do about it...
I thought I had a fix but then further investigations turned out that the issue is weirder than I had thought.
The hang happens when Wine calls XRRSetCrtcConfig() with the same mode it already set. The attached patch demonstrates that. But then a test program setting the same mode over and over works fine. See https://testbot.winehq.org/JobDetails.pl?Key=75608 in full log. Also I can't reproduce the hang with the same Debian testing TestBot VMs running on my machine. The hang doesn't appear with Debian 10 TestBots. The last time Debian testing TestBot VMs were updated is June 15th, which is around the time the hangs started happening.
Looking at the source of XRRSetCrtcConfig(), it seems that it was waiting for a XReply that never came. And that xf86-video-qxl has no recent updates. So my theory is that updated xorg-xserver or some other components had a race condition, and currently only manifests itself on TestBots.
The next step would be to investigate it on TestBots, and try an apt-get update. I am not sure what more can be done on my side.