https://bugs.winehq.org/show_bug.cgi?id=45593
Anastasius Focht focht@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Wargaming.net Game Center: |Wargaming.net Game Center: |Installer deadlocks during |Installer deadlocks during |download |download | |('ntdll.RtlDeregisterWaitEx | |' must not synchronously | |wait when 'CompletionEvent' | |is NULL) CC| |focht@gmx.net Component|-unknown |ntdll Depends on|45599 |
--- Comment #2 from Anastasius Focht focht@gmx.net --- Hello folks,
confirming.
I've managed to capture a trace log from a run which had the deadlock. Filtered for the two relevant threads.
--- snip --- $ pwd /home/focht/.wine/drive_c/ProgramData/Wargaming.net/GameCenter
$ WINEDEBUG=+seh,+loaddll,+process,+ntdll,+threadpool,+relay wine ./wgc.exe --setLang -l en >>log.txt 2>&1 ... 005a:Call KERNEL32.InitializeCriticalSectionAndSpinCount(0b3019b4,00000000) ret=1051bd40 005a:Ret KERNEL32.InitializeCriticalSectionAndSpinCount() retval=00000001 ret=1051bd40 ... 005a:Call KERNEL32.CreateEventW(00000000,00000000,00000000,0b2f89c8 L"wgc_helper_15365709041175080_1781265027") ret=101285f6 005a:Ret KERNEL32.CreateEventW() retval=000002ac ret=101285f6 ... 005a:Call KERNEL32.CreateProcessW(0b301240 L"C:\ProgramData\Wargaming.net\GameCenter\dlls\wgc_helper.exe",0b2fefb8 L""C:\\ProgramData\\Wargaming.net\\GameCenter\\dlls\\wgc_helper.exe" wgc_helper_15365709041175080_1781265027",00000000,00000000,00000000,00000000,00000000,00000000,0e97e924,0e97ead0) ret=10505c94 ... 005a:trace:process:create_process_impl starting L"C:\ProgramData\Wargaming.net\GameCenter\dlls\wgc_helper.exe" as Win32 binary (400000-533000, arch 014c) ... 005a:trace:process:create_process_impl started process pid 005b tid 005c 005a:Ret KERNEL32.CreateProcessW() retval=00000001 ret=10505c94 ... 005a:Call KERNEL32.RegisterWaitForSingleObject(0b3019ac,000002b8,1051cae0,0b301950,ffffffff,00000008) ret=1051c3d0 005a:trace:threadpool:RtlRegisterWait (0xb3019ac, 0x2b8, 0x1051cae0, 0xb301950, -1, 0x8) 005a:trace:threadpool:RtlQueueWorkItem 0x7bca1c95 0x6d773c8 0 005a:trace:threadpool:TpSimpleTryPost 0x7bca1848 0x6d7f238 0xe97e9a8 005a:trace:threadpool:tp_threadpool_alloc allocated threadpool 0x6d7ae28 005a:trace:threadpool:tp_object_initialize allocated object 0x6d7ada8 of type 0 005d:Call TLS callback (proc=0x4f9910,module=0x400000,reason=THREAD_ATTACH,reserved=0) ... 005d:Ret PE DLL (proc=0xea4fde3,module=0xe990000 L"7z.dll",reason=THREAD_ATTACH,res=(nil)) retval=1 005d:Starting thread proc 0x7bca4d37 (arg=0x6d7ae28) 005d:trace:threadpool:threadpool_worker_proc starting worker thread for pool 0x6d7ae28 005d:trace:threadpool:threadpool_worker_proc executing simple callback 0x7bca1848(0xec5fe2c, 0x6d7f238) 005d:trace:threadpool:process_rtl_work_item executing 0x7bca1c95(0x6d773c8) 005d:trace:threadpool:wait_thread_proc 005a:Ret KERNEL32.WaitForSingleObject() retval=00000000 ret=101288b2 005a:Call KERNEL32.AssignProcessToJobObject(000002a4,000002b8) ret=10124941 005a:trace:ntdll:NtAssignProcessToJobObject (0x2a4 0x2b8) 005a:Ret KERNEL32.AssignProcessToJobObject() retval=00000001 ret=10124941 ... 005a:Call KERNEL32.TerminateProcess(000002b8,00000000) ret=1051d2b8 005a:Ret KERNEL32.TerminateProcess() retval=00000001 ret=1051d2b8 ... 005a:Call KERNEL32.UnregisterWaitEx(06d773c8,00000000) ret=1051cb6a 005a:trace:threadpool:RtlDeregisterWaitEx (0x6d773c8 (nil)) 005d:trace:threadpool:wait_thread_proc object 0x2b8 signaled, calling callback 0x1051cae0 with context 0xb301950 005d:Call KERNEL32.GetExitCodeProcess(000002b8,0ec5fd3c) ret=1051c8a6 005d:trace:ntdll:NtQueryInformationProcess (0x2b8,0x00000000,0xec5fc80,0x00000018,(nil)) 005d:Ret KERNEL32.GetExitCodeProcess() retval=00000001 ret=1051c8a6 005d:Call msvcp140._Mtx_lock(10bea81c) ret=104ff5ea 005d:Call msvcr120.?lock@critical_section@Concurrency@@QAEXXZ(10bea820) ret=7c37fe8d 005d:Ret msvcr120.?lock@critical_section@Concurrency@@QAEXXZ() retval=0ec5fbec ret=7c37fe8d ... 005d:Call msvcp140._Mtx_unlock(008b6fd8) ret=1051b954 005d:Call msvcr120.?unlock@critical_section@Concurrency@@QAEXXZ(008b6fdc) ret=7c37ff22 005d:Ret msvcr120.?unlock@critical_section@Concurrency@@QAEXXZ() retval=008b6fe0 ret=7c37ff22 005d:Ret msvcp140._Mtx_unlock() retval=00000000 ret=1051b954 005d:err:ntdll:RtlpWaitForCriticalSection section 0xb3019b4 "?" wait timed out in thread 005d, blocked by 005a, retrying (60 sec) 005d:err:ntdll:RtlpWaitForCriticalSection section 0xb3019b4 "?" wait timed out in thread 005d, blocked by 005a, retrying (5 min) --- snip ---
The Wargaming Game Center process 'wgc.exe' launches a helper process 'wgc_helper.exe' whenever a file/patch has been downloaded. It registers a callback for notification of child process exit using 'RegisterWaitForSingleObject()'. The child process exits on its own or is forcibly terminated, see log.
The race condition/deadlock occurs when the app calls 'UnregisterWaitEx( wait_handle, NULL)' before the wait callback on the threadpool thread is processed. It creates a local completion event and waits synchronously for the signalling, blocking the caller. Then the child process exit waiter thread is woken up, with the app provided callback being called. Unfortunately the app uses a critical section in a questionable way (scope). It enters the critical section before 'UnregisterWaitEx()' on one thread and within the code path of the child process termination callback (some nested calls down), executed in the threadpool thread. This causes a deadlock.
https://docs.microsoft.com/en-us/windows/desktop/sync/unregisterwaitex
--- quote --- BOOL WINAPI UnregisterWaitEx( _In_ HANDLE WaitHandle, _In_opt_ HANDLE CompletionEvent );
Parameters
WaitHandle [in]
The wait handle. This handle is returned by the RegisterWaitForSingleObject function.
CompletionEvent [in, optional]
A handle to the event object to be signaled when the wait operation has been unregistered. This parameter can be NULL.
If this parameter is INVALID_HANDLE_VALUE, the function waits for all callback functions to complete before returning.
If this parameter is NULL, the function marks the timer for deletion and returns immediately. However, most callers should wait for the callback function to complete so they can perform any needed cleanup.
If the caller provides this event and the function succeeds or the function fails with ERROR_IO_PENDING, do not close the event until it is signaled.
... Under some conditions, the function will fail with ERROR_IO_PENDING if CompletionEvent is NULL. This indicates that there are outstanding callback functions. Those callbacks either will execute or are in the middle of executing. --- quote ---
Although 'CompletionEvent' is NULL, Wine currently behaves as if 'INVALID_HANDLE_VALUE' has been passed which causes synchronous wait for threadpool waiter thread event/callback(s) processing which is incorrect.
https://source.winehq.org/git/wine.git/blob/HEAD:/dlls/ntdll/threadpool.c#l6...
--- snip --- 635 NTSTATUS WINAPI RtlDeregisterWaitEx(HANDLE WaitHandle, HANDLE CompletionEvent) 636 { 637 struct wait_work_item *wait_work_item = WaitHandle; 638 NTSTATUS status; 639 HANDLE LocalEvent = NULL; 640 BOOLEAN CallbackInProgress; 641 642 TRACE( "(%p %p)\n", WaitHandle, CompletionEvent ); 643 644 if (WaitHandle == NULL) 645 return STATUS_INVALID_HANDLE; 646 647 CallbackInProgress = wait_work_item->CallbackInProgress; 648 if (CompletionEvent == INVALID_HANDLE_VALUE || !CallbackInProgress) 649 { 650 status = NtCreateEvent( &LocalEvent, EVENT_ALL_ACCESS, NULL, NotificationEvent, FALSE ); 651 if (status != STATUS_SUCCESS) 652 return status; 653 interlocked_xchg_ptr( &wait_work_item->CompletionEvent, LocalEvent ); 654 } 655 else if (CompletionEvent != NULL) 656 { 657 interlocked_xchg_ptr( &wait_work_item->CompletionEvent, CompletionEvent ); 658 } 659 660 NtSetEvent( wait_work_item->CancelEvent, NULL ); 661 662 if (interlocked_inc( &wait_work_item->DeleteCount ) == 2 ) 663 { 664 status = STATUS_SUCCESS; 665 delete_wait_work_item( wait_work_item ); 666 } 667 else if (LocalEvent) 668 { 669 NtWaitForSingleObject( LocalEvent, FALSE, NULL ); 670 status = STATUS_SUCCESS; 671 } 672 else 673 { 674 status = STATUS_PENDING; 675 } 676 677 if (LocalEvent) 678 NtClose( LocalEvent ); 679 680 return status; 681 } --- snip ---
Line 648 is problematic. This code was introduced/changed by https://source.winehq.org/git/wine.git/commitdiff/af35aada9b078f8dc71dcc85e5... which ought to make 'DeregisterWaitEx' more thread safe.
You need to return immediately here with "pending" status after signalling "cancel" event.
Also removing bug 45599 dependency here. It has no use.
$ sha1sum total_war_arena_install_eu_bo9nmmaomva1.exe b569b010964c46327b3ce8b77611bc4513d25388 total_war_arena_install_eu_bo9nmmaomva1.exe
$ du -sh total_war_arena_install_eu_bo9nmmaomva1.exe 4.7M total_war_arena_install_eu_bo9nmmaomva1.exe
$ wine --version wine-3.15-94-gbfe8510ec0
Regards