On 7/17/20 12:00 AM, Zhiyi Zhang wrote:
On 7/4/20 12:33 AM, Rémi Bernon wrote:
On 2020-07-03 17:05, Rémi Bernon wrote:
On 2020-07-03 15:48, Rémi Bernon wrote:
On 2020-07-03 15:32, Francois Gouget wrote:
On Fri, 3 Jul 2020, Rémi Bernon wrote: [...]
Ah indeed, I thought I could add more arguments to run them all.
The monitor test causes a Xephyr segmentation fault locally, but the msg test seems to run fine. I wonder if it could be snowballing from the monitor test.
I don't see X error messages in the "full task log". But if X segfaulted directly that would probably not be visible there.
I never tested it but if X segfaults I suspect we'd get back to the chooser which should reconnect the default user after the standard timeout (<30 seconds) and the tests are likely to continue running in the meantime. It's possible they could be stuck while X is restarting which could explain the string of timeouts.
It may be possible to test for this condition by starting an independent windows process displaying a window in an early test (CreateProcess("clock.exe")?) and checking the final screenshot to see if that window is still there.
I cancelled the remaining tests as it was taking way too long to timeout. I'm also investigating the monitor test, as the other tests run fine separately (at least msg, as shown here: https://testbot.winehq.org/JobDetails.pl?Key=74749). My local Xephyr crash seems actually unrelated (sadly).
It looks like that applying the mode changes by calling
ChangeDisplaySettingsExA(NULL ...)
sometimes takes an unusually long time, up to ~100s for a single mode change for instance, as shown here:
https://testbot.winehq.org/JobDetails.pl?Key=74774&f101=win32.report#k10...
With all the modes being iterated, it's no wonder that it ends up timing out. I think that it may be possible that the timeout then causes additional issues, if the process is killed during the mode change?
So it all ends up being this call in xrandr12_set_current_mode:
status = pXRRSetCrtcConfig( gdi_display, resources, resources->crtcs[primary_crtc], CurrentTime, crtc_info->x, crtc_info->y, xrandr12_modes[mode], crtc_info->rotation, crtc_info->outputs, crtc_info->noutput );
that sometimes take ~100s. I can see it consistently in the Win32 tests, when it restores the modes (although it's already in the correct mode), but I disabled the individual mode tests to reduce the test timeouts, so it may also happen in the other archs with the whole test.
I'm not sure what to do about it...
I thought I had a fix but then further investigations turned out that the issue is weirder than I had thought.
The hang happens when Wine calls XRRSetCrtcConfig() with the same mode it already set. The attached patch demonstrates that. But then a test program setting the same mode over and over works fine. See https://testbot.winehq.org/JobDetails.pl?Key=75608 in full log. Also I can't reproduce the hang with the same Debian testing TestBot VMs running on my machine. The hang doesn't appear with Debian 10 TestBots. The last time Debian testing TestBot VMs were updated is June 15th, which is around the time the hangs started happening.
Looking at the source of XRRSetCrtcConfig(), it seems that it was waiting for a XReply that never came. And that xf86-video-qxl has no recent updates. So my theory is that updated xorg-xserver or some other components had a race condition, and currently only manifests itself on TestBots.
I grabbed the package list of debiant before and after the june upgrade for comparison. But the closest I get to relevant package upgrades are a bunch of OpenGL (libegl, libgl1, libgles2) and Mesa driver upgrades. Maybe I missed something.
2020-05-04 -> https://termbin.com/br19 2020-06-15 -> https://termbin.com/8cb9
It looks like a libX11 bug. Comparing the package list from 2020-05-04 to 2020-06-15, note that libx11-6 was upgraded from 1.6.9-2 to 1.6.9-2+b1. There is XReply bug in both the lastest and 1.6.9 version of libX11 and I think it's being triggered somehow on TestBots. See https://gitlab.freedesktop.org/xorg/lib/libx11/-/issues/93 There is a patch in https://gitlab.freedesktop.org/xorg/lib/libx11/-/merge_requests/29 but it's not merged yet.
Maybe we should downgrade the package for the moment and wait for the fix to land. Or install a custom build with the fix.