Hi all,
I decided to investigate this old bug:
http://www.winehq.com/hypermail/wine-devel/2003/12/0158.html
even though Alexander removed thread initialization from x11drv. So the first thing I did was apply this patch:
Index: x11drv_main.c =================================================================== RCS file: /home/wine/wine/dlls/x11drv/x11drv_main.c,v retrieving revision 1.87 diff -u -r1.87 x11drv_main.c --- x11drv_main.c 30 Jan 2004 22:54:39 -0000 1.87 +++ x11drv_main.c 1 Feb 2004 18:02:26 -0000 @@ -332,6 +332,8 @@
/* Open display */
+ if (!XInitThreads()) ERR( "XInitThreads failed, trouble ahead\n" ); + if (!(display = XOpenDisplay( NULL ))) { MESSAGE( "x11drv: Can't open display: %s\n", XDisplayName(NULL) );
I determined that the tests under user/tests were causing deadlock. The attached file debugrun1 shows this. Start the debugger with:
winedbg user32_test.exe.so
and repeat the steps I made, and the executable should deadlock at the call to XGetScreenSaver. (By the way, I used a version of the tests where I'd commented out all other tests except the test_SPI_GETSCREENSAVEACTIVE one, and that's why you only see "testing SPI_{GET,SET}SCREENSAVEACTIVE", but that's not important.)
Next, I decided to investigate which mutual exclusion primitive is causing the trouble. This run is included as debugrun2. There are two threads involved: thread 0xb is the application main thread, and 0xe is a thread created by the main thread to generate the tests. Thread 0xb owns a critical section that 0xe wants, and hence deadlock. Note the call to "walk thread" in the beginning. Note also that I single-stepped thread 0xe right up to the call to RtlpWaitForCriticalSection that deadlocks.
I also added some display commands to show you the details of the critical section in question. Now, if you look at what the display says you'll notice something odd: at first crit = 0x0011d1e0, but then it switches to 0x403112b8. And this is without any breakpoints having been set up within RtlEnterCriticalSection! So it can't be a context switch - it seems like a bug in the debugger. Or is this intentional in some way?
Anyway, this critical section is the one the x11drv library sets up when XInitThreads is called. Next I attempted to find out where the main thread acquires the critical section, before the deadlock. This is attached as debugrun3. Notice that I worked around the odd debugger feature mentioned in the previous paragraph, by placing a breakpoint not at the entry point of RtlEnterCriticalSection, but a few next's later. This allowed me to add the condition crit==0x403112b8. The "crit" pointer seems to consistently stay the same between debugger sessions, so this works (for me at least).
See how LockCount is 0, even though this is the first call to RtlEnterCriticalSection with crit==0x403112b8 during this session? It should be -1 for a critical section that is being acquired for the first time! So this made me suspect that the crit sect is acquired during initialization (and then not released), which is why I applied the following patch:
Index: critsection.c =================================================================== RCS file: /home/wine/wine/dlls/ntdll/critsection.c,v retrieving revision 1.23 diff -u -r1.23 critsection.c --- critsection.c 5 Sep 2003 23:08:34 -0000 1.23 +++ critsection.c 4 Feb 2004 20:15:46 -0000 @@ -243,6 +243,7 @@ */ NTSTATUS WINAPI RtlEnterCriticalSection( RTL_CRITICAL_SECTION *crit ) { + if (crit==(RTL_CRITICAL_SECTION *)0x403112b8) TRACE("critsection=%p: LockCount=%ld\n", crit, crit->LockCount); if (interlocked_inc( &crit->LockCount )) { if (crit->OwningThread == (HANDLE)GetCurrentThreadId())
and then launched the debugger with:
winedbg --debugmsg -all,+ntdll user32_test.exe.so
(I didn't bother with making the reference to crit->LockCount thread-safe.) The result of this is attached as debugrun4. Notice this line:
No debug information in ELF '/usr/X11R6/lib/libXrender.so.1' (0x40016000)
See how after four calls to RtlEnterCriticalSection, LockCount starts to be 0? So a pretty good theory is that the critical section is acquired, and then not released, during the loading and initialization of /usr/X11R6/lib/libXrender.so.1. This theory is not bulletproof, which is why I attempted to see whether LockCount ever becomes -1 again, but the debugger consistently barfed on me (== crashed).
Well, that's it for now - I'll let someone more knowledgeable in libraries and their initialization carry on with this, if the're interested. A final note: naturally, you most likely won't have the same address for the critical section as me, but you should be able to find out what it is if you redo the steps I made. Have fun!
-- Sami
"Sami Aario" sami.aario@surfeu.fi writes:
See how after four calls to RtlEnterCriticalSection, LockCount starts to be 0? So a pretty good theory is that the critical section is acquired, and then not released, during the loading and initialization of /usr/X11R6/lib/libXrender.so.1.
Most likely this is the missing UnlockDisplay() call in XRenderQueryFormat(). This was fixed in XFree 4.2.0.
I also added some display commands to show you the details of the critical section in question. Now, if you look at what the display says you'll notice something odd: at first crit = 0x0011d1e0, but then it switches to 0x403112b8. And this is without any breakpoints having been set up within RtlEnterCriticalSection! So it can't be a context switch - it seems like a bug in the debugger. Or is this intentional in some way?
it's not really intentional. However, the first asm insn of any functions (the prolog) are used to set up the frame for the function (ie the way to look at local vars...) in the first two instructions, the function's frame is not properly set up, however, winedbg believes that it's set up and use the current information to look at local & passed variable (while the current frame is still the one from the caller). in general this is not problematic, it may be trickier when using conditional breakpoints on passed variables (where you'll never get the correct information). Setting the bp after the prolog (as you did) is the right solution.
A+