I'm able to consistently reproduce a kernel bug whilst running Wine (both 0305 and todays CVS) and the version of Outlook Express that comes with Win98.
That is, if I use a pretty stock Wine config, and launch msimn.exe, we get the splash screen, and then a process stuck in 'D' (uninterruptible sleep). At this point, AFAIK, the only solution is to wait for the IO to timeout (I haven't been patient enough to see if it does), or to reboot.
Because the process is already gone, I can't strace or gdb it.
Anyone a) Want to take this on? b) Have advice on how I can generate enough debug info that we can feed it to the Kernel guys to get it fixed?
And yes, this happens both on a 2.2 system and a 2.4 system.
I saw at least other user report on this (Steve Fox reported this with Lotus Notes).
My best guess is that this is a bug in the vfat driver, but that's just a hunch.
Jer
Hi Jeremy,
That is, if I use a pretty stock Wine config, and launch msimn.exe, we get the splash screen, and then a process stuck in 'D' (uninterruptible sleep). At this point, AFAIK, the only solution is to wait for the IO to timeout (I haven't been patient enough to see if it does), or to reboot.
Could you check in which kernel routine the process is blocked? (e.g. using ps -Ao pid,wchan)
Because the process is already gone, I can't strace or gdb it.
What do you mean 'the process is already gone'? Could it be that ptrace doesn't work because the Wine server has already attached to that client process?
Anyone a) Want to take this on? b) Have advice on how I can generate enough debug info that we can feed it to the Kernel guys to get it fixed?
And yes, this happens both on a 2.2 system and a 2.4 system.
Hmm. Could you try getting a Wine trace (especially the server/ client interaction)?
I saw at least other user report on this (Steve Fox reported this with Lotus Notes).
My best guess is that this is a bug in the vfat driver, but that's just a hunch.
Steve, is VFAT also involved in your case?
Bye, Ulrich
Could you check in which kernel routine the process is blocked? (e.g. using ps -Ao pid,wchan)
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND jwhite 2531 2.3 14.8 86144 38016 pts/0 D 16:02 0:05 wine --debugmsg +server msimn.exe
PID WCHAN 2531 down
Hmm. Could you try getting a Wine trace (especially the server/ client interaction)?
ftp://wine.codeweavers.com/pub/other/outlook-server.out.gz (about 160K).
Thanks,
Jer
Hi Jeremy,
Could you check in which kernel routine the process is blocked? (e.g. using ps -Ao pid,wchan)
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND jwhite 2531 2.3 14.8 86144 38016 pts/0 D 16:02 0:05 wine --debugmsg +server msimn.exe
PID WCHAN 2531 down
Great :-( Now the interesting thing would be to know *which* semaphore this is waiting on ...
I'd really like to see a kernel stack backtrace on this. However, there doesn't appear to be an easy way to do so ...
Hmm. Could you try getting a Wine trace (especially the server/ client interaction)?
ftp://wine.codeweavers.com/pub/other/outlook-server.out.gz (about 160K).
Is this from the same run as above? I.e. is the process with pid 2531 part of this trace?
B.t.w. the trace appears to show that you pressed ^C to terminate the process. Does the symptom (processes in state D) show *before* you press ^C or only afterwards?
Bye, Ulrich
PID WCHAN 2531 down
Great :-( Now the interesting thing would be to know *which* semaphore this is waiting on ...
I'd really like to see a kernel stack backtrace on this. However, there doesn't appear to be an easy way to do so ...
I was pretty sure this was going to be a fun one...
Hmm. Could you try getting a Wine trace (especially the server/ client interaction)?
ftp://wine.codeweavers.com/pub/other/outlook-server.out.gz (about 160K).
Is this from the same run as above? I.e. is the process with pid 2531 part of this trace?
Yes.
B.t.w. the trace appears to show that you pressed ^C to terminate the process. Does the symptom (processes in state D) show *before* you press ^C or only afterwards?
I'm nearly certain that I see the process show up as run state 'D' prior to pressing ^C, but I may be wrong. I tend to notice this problem, of course, only after pressing ^C and trying to do a killall. I'll poke around a bit more (sorry for not reproducing this more quickly; since it tends to cost me a reboot and a lengthy fsck to repair, I'm a little slow to iterate the test cycles <g>).
Jer
Hi Jeremy,
Great :-( Now the interesting thing would be to know *which* semaphore this is waiting on ...
I'd really like to see a kernel stack backtrace on this. However, there doesn't appear to be an easy way to do so ...
I was pretty sure this was going to be a fun one...
OK, I've hacked together a silly tool that generates a kernel stack backtrace from userspace.
Note that this doesn't do much error checking; please compile against kernel sources that match the running kernel, and don't try to use the tool for *running* processes ...
To generate the dump, just run 'dump <pid>'. The result is suitable for piping through 'ksymoops' for symbol lookup.
Bye, Ulrich
Just spent 30 minutes trying to reproduce with Outlook, and now, of course, I can't.
I may well have pressed ^C at some point; I'm learning that Outlook does, eventually (2-3 minutes) come up (but promptly crashes).
However, whilst trying something else, I had another winelib program crash.
So, I grabbed what info from it I could in the attached file.
Jer
Stuck Process:
jwhite 1503 0.5 2.6 32956 6684 pts/2 D 20:07 0:02 testapp -debugmsg +relay
ps -p 1503 -o pid,wchan PID WCHAN 1503 down
dump 1503
Process 1503 found at c2cd8000, esp c2cd9e04 Call Trace: [<c010794f>] [<c2cd8000>]
/usr/src/linux-2.4.0/System.map fragment:
c01078f8 T __down c0107994 T __down_interruptible c0107a60 T __down_trylock c0107a9c T __down_failed . . . c02f6c14 A _end
I have the feeling there is something more I could have gotten you here, but I'm not sure what.
Last 21 lines or so of a relay log:
ebp=405b679c esp=405b6788 ds=002b es=002b gs=0000 flags=00000202 Ret kernel32.4: VxDCall3() retval=00000002 ret=bfe819e3 fs=008f eax=00000002 ebx=417c4188 ecx=00000001 edx=405b67cc esi=0000006c edi=405b67cc ebp=405b679c esp=405b6794 ds=002b es=002b gs=0000 flags=00000202 Call kernel32.455: GetPrivateProfileStringA(417bb8e7 "mci",00000000,417bb80a "",403d5b48,00000800,417bc626 "SYSTEM.INI") ret=4178b025 fs=008f Ret kernel32.455: GetPrivateProfileStringA() retval=00000051 ret=4178b025 fs=008f Call kernel32.2: VxDCall1(00010013) ret=bfe8165f fs=008f eax=00000001 ebx=417c4188 ecx=fffffffe edx=417c20e4 esi=0000006c edi=403d5b9a ebp=405b67d4 esp=405b67a0 ds=002b es=002b gs=0000 flags=00000202 Ret kernel32.2: VxDCall1() retval=00000000 ret=bfe8165f fs=008f eax=00000000 ebx=417c4188 ecx=fffffffe edx=417c20e4 esi=0000006c edi=403d5b9a ebp=405b67d4 esp=405b67a4 ds=002b es=002b gs=0000 flags=00000202 Call kernel32.947: lstrcpynA(405b668c,417bb4a7 "wineoss.drv",00000080) ret=4177f8db fs=008f Ret kernel32.947: lstrcpynA() retval=405b668c ret=4177f8db fs=008f Call kernel32.553: HeapAlloc(40360000,00000000,0000001c) ret=4177f666 fs=008f Ret kernel32.553: HeapAlloc() retval=403d8774 ret=4177f666 fs=008f Call kernel32.601: LoadLibraryA(405b668c "wineoss.drv") ret=4177f680 fs=008f Ret kernel32.601: LoadLibraryA() retval=417c8000 ret=4177f680 fs=008f Call kernel32.459: GetProcAddress(417c8000,417ba14b "DriverProc") ret=4177f6a0 fs=008f Ret kernel32.459: GetProcAddress() retval=417d8e90 ret=4177f6a0 fs=008f Call wineoss.1: DriverProc(00000000,403d8774,00000001,00000000,00000000) ret=4177f3e2 fs=008f
Process 1503 found at c2cd8000, esp c2cd9e04 Call Trace: [<c010794f>] [<c2cd8000>]
Hmpf, this is not quite what I expected :-/ (Maybe the kernel is compiled with -fomit-frame-pointer?)
Could you try using a modified version of dump that will at least dump out the complete kernel stack so that we can analyse it manually?
--- dump/dump.c Mon Mar 26 23:47:18 2001 +++ dump.new/dump.c Tue Mar 27 21:55:11 2001 @@ -95,6 +95,18 @@ printf( " [<%08x>]", eip ); } printf( "\n" ); + +{ + unsigned int stack[2048]; + unsigned int start = esp & -16; + unsigned int i, len = addr + 8192 - start; + read_kernel( stack, start, len ); + + printf( "Stack Dump:\n" ); + for ( i = 0; i < len/16; i++ ) + printf( "0x%08x: %08x %08x %08x %08x\n", + start + 16*i, stack[4*i], stack[4*i + 1], stack[4*i + 2], stack[4*i + 3] ); +} }
int main( int argc, char *argv[] )
Bye, Ulrich