http://bugs.winehq.org/show_bug.cgi?id=20390
Dan Kegel dank@kegel.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|chromium ipc_tests.exe |chromium ipc_tests.exe |hangs sometimes on |hangs sometimes on |IPCSyncChannelTest.Multiple |IPCSyncChannelTest.*
--- Comment #3 from Dan Kegel dank@kegel.com 2009-10-28 19:27:05 --- I stared at this today for several hours. I'm kind of out of steam for today, here's a brain dump.
Running IPCSyncChannelTest.* will hang fairly reliably. IPCSyncChannelTest.Simple hangs about half the time, even if you comment out the second half of IPCSyncChannelTest.Simple.
Test shutdown is supposed to go like this: main thread: ~Worker posts call to OnListenerThreadShutdown1, waits listener thread: OnListenerThreadShutdown1 posts call to OnIPCThreadShutdown ipc thread: OnIPCThreadShutdown posts call to OnListenerThreadShutdown2 main thread: wake up, finish.
Unfortunately, sometimes the task to call OnIPCThreadShutdown() never gets run. You can see the guts of the task queueing in
http://src.chromium.org/viewvc/chrome/trunk/src/base/message_pump_win.cc
The task queuer puts the task in the incoming queue, then calls
void MessagePumpForIO::ScheduleWork() { if (InterlockedExchange(&have_work_, 1)) return; // Someone else continued the pumping.
// Make sure the MessagePump does some work for us. BOOL ret = PostQueuedCompletionStatus(port_, 0, reinterpret_cast<ULONG_PTR>(this), reinterpret_cast<OVERLAPPED*>(this)); DCHECK(ret); }
to wake up the consumer, who is sleeping in GetQueuedCompletionStatus().
With winedbg, I can get a good backtrace of the hang (yay pdb support):
4 WaitForSingleObject+0x3c(handle=0x70, timeout=4294967295) [wine-git/dlls/ker nel32/sync.c:129] in kernel32 5 base::WaitableEvent::Wait+0x3c() [src\base\waitable_event_win.cc:62] in ipc_ tests 6 `anonymous namespace'::Worker::~Worker+0xb0() [src\ipc\ipc_sync_channel_unit test.cc:79] in ipc_tests 7 `anonymous namespace'::SimpleClient::~SimpleClient+0x16() in ipc_tests 8 `anonymous namespace'::SimpleClient::`scalar deleting destructor'+0x16() in ipc_tests 9 STLDeleteContainerPointers<std::_Vector_iterator<`anonymous namespace'::Work er *,std::allocator<`anonymous namespace'::Worker *> > >+0x74(begin={}, end={}) [src\base\stl_util-inl.h:67] in ipc_tests 10 `anonymous namespace'::RunTest+0x11c(workers={_Myfirst=0x9bbc70, _Mylast=0x 9bbc78, _Myend=0x9bbc78}) [src\ipc\ipc_sync_channel_unittest.cc:248] in ipc_test s 11 `anonymous namespace'::Simple+0xae(pump_during_send=false) [src\ipc\ipc_syn c_channel_unittest.cc:289] in ipc_tests 12 IPCSyncChannelTest_Simple_Test::TestBody+0x15() [src\ipc\ipc_sync_channel_u nittest.cc:297] in ipc_tests
(though I don't *quite* trust the line numbers).
A +relay trace of the problem looks kind of like this (with a few extra prints): ... 001f:Ret KERNEL32.CreateEventW() retval=00000070 ret=00472b92 001f:Call KERNEL32.OutputDebugStringA("in ~Worker, created listener_done event handle 0x70") 001f:Call KERNEL32.OutputDebugStringA("in ~Worker, created ipc_done event handle 0x74") 0023:Call KERNEL32.OutputDebugStringA("in OnListenerThreadShutdown1, ipc_event handle 0x74") 0023:Call KERNEL32.InterlockedExchange(009be830,00000001) ret=004bd24d 0023:Ret KERNEL32.InterlockedExchange() retval=00000000 ret=004bd24d 0023:Call KERNEL32.PostQueuedCompletionStatus(000000c0,00000000,009be800,009be80 0) ret=004bd27b 0023:Ret KERNEL32.PostQueuedCompletionStatus() retval=00000001 ret=004bd27b 0044:Ret KERNEL32.GetQueuedCompletionStatus() retval=00000001 ret=004bd899 0044:Call KERNEL32.InterlockedExchange(009be830,00000000) ret=004bda05 0044:Ret KERNEL32.InterlockedExchange() retval=00000001 ret=004bda05 0044:Call KERNEL32.GetQueuedCompletionStatus(000000c0,010be408,010be340,010be334 ,ffffffff) ret=004bd899 001f:Call KERNEL32.OutputDebugStringA(009c0078 "in ~Worker, ipc_thread handle 0xBC id 68") 001f:Call KERNEL32.OutputDebugStringA(009bbed0 "in ~Worker, about to wait on listener_done") 001f:Call KERNEL32.WaitForSingleObject(00000070,ffffffff) ret=00472ffc
It's a little hard to trace, since both the server and client side of the test are running in the same process. I tried adding print statements in message_pump_win.cc and rebuilding the test; that worked to some extent, but if I got too aggressive with prints, the hang would go away.