Hi,
When downloading headers form multiple groups Agent often locks up.
The program does the following:
-(Main). main thread creates a sub thread:
| 0009:Call kernel32.CreateThread(00000000,00002000,006648e0,41ac6cb0,00000004,41ac6cb0) ret=006649c9 | 0009: *fd* 23 <- 41 | 0009: new_thread( suspend=1, inherit=0, request_fd=23 ) | 0009: new_thread() = 0 { tid=0021, handle=0xcc } | 0021: *fd* 27 <- 42 | 0021: *fd* 29 <- 45 | 0021: init_thread( unix_pid=4966, unix_tid=-1, teb=0x40319000, entry=0x404fa110, reply_fd=27, wait_fd=29 ) | 0021: init_thread() = 0 { pid=0008, tid=0021, boot=0, version=126 } | 0021: select( flags=8, cookie=0x4031ac78, timeout=1068729666.690492 (-0.000050), handles={} ) | 0021: select() = PENDING | 0009:Ret kernel32.CreateThread() retval=000000cc ret=006649c9
-(SUB). Sub thread updates some files and finishes with sending a WM_NULL message to the main thrad:
| 0021:Call user32.PostMessageA(00010026,00000000,00000000,00000000) ret=0058d0d6 | 0021:Call kernel32.GetTickCount() ret=407924ca | 0021:Ret kernel32.GetTickCount() retval=000ab0db ret=407924ca | 0021: send_message( id=0009, type=6, flags=0, win=0x10026, msg=00000000, wparam=00000000, lparam=00000000, x=0, y=0, time=000ab0db, info=00000000, timeout=-1, callback=(nil), data={} ) | 0021: send_message() = 0 | 0021:Ret user32.PostMessageA() retval=00000001 ret=0058d0d6
-(SUB). Close the thread handle:
| 0021:Call kernel32.CloseHandle(000000cc) ret=006648d0 | 0021: close_handle( handle=0xcc ) | 0021: close_handle() = 0 { fd=-1 } | 0021:Ret kernel32.CloseHandle() retval=00000001 ret=006648d0
-(SUB). terminates:
| 0021:Call kernel32.ExitThread(00000000) ret=006648df | 0021: terminate_thread( handle=0xfffffffe, exit_code=0 ) | 0021: terminate_thread() = 0 { self=1, last=0 } [snip] | 0021:Ret PE DLL (proc=0x4079ad40,module=0x406f0000 L"user32.dll",reason=THREAD_DETACH,res=(nil)) retval=1 | 0021:Call PE DLL (proc=0x40cef0a0,module=0x40cb0000 L"x11drv.dll",reason=THREAD_DETACH,res=(nil)) | 0021:Ret PE DLL (proc=0x40cef0a0,module=0x40cb0000 L"x11drv.dll",reason=THREAD_DETACH,res=(nil)) retval=1 | 0021:Call PE DLL (proc=0x404c8980,module=0x40480000 L"kernel32.dll",reason=THREAD_DETACH,res=(nil)) | 0021:Ret PE DLL (proc=0x404c8980,module=0x40480000 L"kernel32.dll",reason=THREAD_DETACH,res=(nil)) retval=1 | 0021: *killed* exit_code=0
-(Main). In the meantime the main thread: - gets data from the server (winsok send and rcv) - writes to the file - processes queued messages until it is empty - then is wait on the handle of the subthread:
| 0009:Call kernel32.WaitForSingleObject(000000cc,ffffffff) ret=004ed693
-(Main). which ends immedeately:
| 0009: select( flags=4, cookie=0x406cf4e0, timeout=0, handles={0xcc} ) | 0009: select() = INVALID_HANDLE | 0009:Ret kernel32.WaitForSingleObject() retval=ffffffff ret=004ed693
-(Main). close the subthread handle.
| 0009:Call kernel32.CloseHandle(000000cc) ret=004ed69c | 0009: close_handle( handle=0xcc ) | 0009: close_handle() = INVALID_HANDLE { fd=0 } | 0009:Ret kernel32.CloseHandle() retval=00000000 ret=004ed69c
This may repeat a number of times but then without warning:
-(SUB). terminates as usual:
| 0023:Ret PE DLL (proc=0x4079ad40,module=0x406f0000 L"user32.dll",reason=THREAD_DETACH,res=(nil)) retval=1 | 0023:Call PE DLL (proc=0x40cef0a0,module=0x40cb0000 L"x11drv.dll",reason=THREAD_DETACH,res=(nil)) | 0023:Ret PE DLL (proc=0x40cef0a0,module=0x40cb0000 L"x11drv.dll",reason=THREAD_DETACH,res=(nil)) retval=1 | 0023:Call PE DLL (proc=0x404c8980,module=0x40480000 L"kernel32.dll",reason=THREAD_DETACH,res=(nil)) | 0023:Ret PE DLL (proc=0x404c8980,module=0x40480000 L"kernel32.dll",reason=THREAD_DETACH,res=(nil)) retval=1 | 0023: *killed* exit_code=0
-(Main). Finishes its message processing and the waits:
| 0009:Call kernel32.WaitForSingleObject(000000cc,ffffffff) ret=004ed693 | 0009: select( flags=4, cookie=0x406cf4e0, timeout=0, handles={0xcc} ) | 0009: select() = PENDING
And this wait never returns until explicitly killed:
| 0009: *killed* exit_code=0
I looked at many of these sequences. In all of them the subthread finished well before the main thread executed its Wait. I was looking for timing differences between the processing of the two threads, checking whether and when the WM_NULL message gets processed. Nothing seems to stand out.
Any suggestions how to debug this?
A large cut from the +relay,+server trace is on http://www.xs4all.nl/~rklazes/temp/agserver.log.bz2
Rein.