https://bugs.winehq.org/show_bug.cgi?id=53014
Bug ID: 53014 Summary: HICON leak in CopyImage causes TheBat! to crash after a while Product: Wine Version: 7.5 Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: user32 Assignee: wine-bugs@winehq.org Reporter: jswinebz@kanargh.org.uk Distribution: ---
So, I just updated from my locally patched wine to the current F35 version, which is 7.5 plus staging.
That means I lost my version of the patch on bug 52207 and gained the official version.
Unfortunately my original problem then reappeared. After about 30 minutes runtime The Bat! freezes up and cannot even repaint its windows.
If I disable its "Animate System Tray Icon" preference, then it runs stable indefinitely. (As it did under my version of the patch against 6.21.)
The debug trace output doesn't look entirely the same: I'm not seeing any clear indication of HBITMAP overflow, in fact those counts look stable and reasonable. But it definitely looks like *something* is still leaking within CopyImage, and eventually causing a complete breakdown of GDI.
(I'll continue to dig into this, just raising in case it's more obvious to someone else.)
https://bugs.winehq.org/show_bug.cgi?id=53014
jswinebz@kanargh.org.uk changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com
https://bugs.winehq.org/show_bug.cgi?id=53014
Fabian Maurer dark.shadow4@web.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |dark.shadow4@web.de
https://bugs.winehq.org/show_bug.cgi?id=53014
--- Comment #1 from jswinebz@kanargh.org.uk --- Ok, I notice this difference between my previous patched 6.21 and the current 7.5.
In user32/cursoricon.c CopyImage we used to have:
if (!(icon = get_icon_ptr( hnd ))) return 0;
if (icon->rsrc && (flags & LR_COPYFROMRESOURCE)) { hnd = CURSORICON_Load( icon->module, icon->resname, desiredx, desiredy, depth, !icon->is_icon, flags ); release_user_handle_ptr( icon ); if (!(icon = get_icon_ptr( hnd ))) return 0; }
Now in 7.5 we have:
if (!GetIconInfoExW( hnd, &icon_info )) return 0;
if (icon_info.szModName[0] && (flags & LR_COPYFROMRESOURCE) && (module = GetModuleHandleW( icon_info.szModName ))) { const WCHAR *res = icon_info.szResName[0] ? icon_info.szResName : MAKEINTRESOURCEW( icon_info.wResID ); resource_icon = CURSORICON_Load( module, res, desiredx, desiredy, depth, !icon_info.fIcon, flags ); DeleteObject( icon_info.hbmColor ); DeleteObject( icon_info.hbmMask ); NtUserGetIconSize( resource_icon, 0, &width, &height ); if (!GetIconInfoExW( resource_icon, &icon_info )) return 0; }
If I add additional TRACE output then I get:
if (!GetIconInfoExW( hnd, &icon_info )) return 0; + TRACE("GetIconInfoExW -> mask=%p color=%p modname='%s'\n", icon_info.hbmMask, icon_info.hbmColor, debugstr_w(icon_info.szModName)); + if (icon_info.szModName[0] && (flags & LR_COPYFROMRESOURCE)) { + module = GetModuleHandleW( icon_info.szModName ); + TRACE("module=%p\n", module); + }
0124:trace:cursor:CopyImage CopyImage: hnd=0000000000010178, type=1, desiredx=16, desiredy=16, flags=4000 ... 0124:trace:cursor:CopyImage GetIconInfoExW -> mask=000000003B090583 color=00000000A509043F modname='L"C:\Prog"' 0124:trace:cursor:CopyImage module=0000000000000000
So immediately we can see that it *looks* like szModName is truncated[*], and that GetModuleHandle is failing to find the HMODULE for that dll. (And we can infer from the first four letters of the path that it is likely one of the application's own dlls containing its resources...) As such we'll muddle our way through the non-COPYFROMRESOURCE path. I haven't figured out the entire consequences of that yet, but it clearly wasn't what was intended...
[*] (and for this call it always appears to be truncated to either 7-chars/an 8-char buffer or 20-chars/a 21-char buffer. There are the occasional calls for icons from "C:\windows\system32\user32.dll" which log the complete path correctly. It could be that dbgstr_w is running out of temp-buffer and truncating, but since we are then failing the GetModuleHandle call I assume that the underlying data actually is the problem.)
Anyway, we carry on doing that for a while, and the systray-icon *does* display and animate, but after half an hour we get to:
0124:trace:cursor:CopyImage GetIconInfoExW -> mask=000000002209024B color=00000000230900CD modname='L"C:\Prog"' 0124:err:sync:RtlpWaitForCriticalSection section 0000000170067640 "dlls/ntdll/loader.c: loader_section" wait timed out in thread 0124, blocked by 020c, retrying (60 sec) 0210:err:sync:RtlpWaitForCriticalSection section 0000000170067640 "dlls/ntdll/loader.c: loader_section" wait timed out in thread 0210, blocked by 020c, retrying (60 sec)
The first TRACE always happens, but we often don't even get the "wait timed out" TRACEs after it. However we never get to the "module=" TRACE line so it looks like we've deadlocked there somewhere within GetModuleHandle...
I'll further note that since my original patch, which appeared affective, was triggered within the original if(COPYFROMRESOURCE) statement in 6.21, that icon->rsrc *must* have been non-NULL and we *must* have been hitting the CURSORICON_Load path in that case (without, in that version, any need to call GetModuleHandle or look at szModName).
https://bugs.winehq.org/show_bug.cgi?id=53014
--- Comment #2 from jswinebz@kanargh.org.uk --- Created attachment 72494 --> https://bugs.winehq.org/attachment.cgi?id=72494 Fix UNICODE_STRING handling in NtUserGetIconInfo
Well, the module filename truncation is due to NtUserGetIconInfo messing up its string handling. Firstly, UNICODE_STRING *definitely* counts in bytes, always, never characters. Secondly, when calculating the length of the returned res_name, it stores the value in the module name's buffer, overwriting the module name length it just calculated above. Since resource names tend to be shortish strings, this will usually have the effect of truncating the returned path.
That gets us into the correct path within CopyImage to do COPYFROMRESOURCE properly, but doesn't solve the deadlock.
Turning on +module too, at the point of the crash we get:
05fc:trace:cursor:CopyImage hnd=000000000006016C, type=1, desiredx=16, desiredy=16, flags=4000 [...] 05fc:trace:cursor:CURSORICON_Load 0000000000400000, L"Z_BAT_1", 16x16, depth 32, fCursor 0, flags 0x4000 05fc:trace:module:FindResourceExW 0000000000400000 #000e L"Z_BAT_1" 0000 05fc:trace:module:LoadResource 0000000000400000 00000000026B7AE0 05fc:trace:cursor:CURSORICON_FindBestIcon entry 0: 32 x 32, 4 bpp 05fc:trace:cursor:CURSORICON_FindBestIcon entry 1: 16 x 16, 4 bpp 05fc:trace:cursor:CURSORICON_FindBestIcon entry 2: 16 x 16, 8 bpp 05fc:trace:module:FindResourceExW 0000000000400000 #0003 #0023 0000 05fc:trace:module:LoadResource 0000000000400000 00000000026B5F90 07dc:trace:module:LdrShutdownThread () 07dc:trace:module:MODULE_InitDLL (00000000043E0000 L"libegl.dll",THREAD_DETACH,0000000000000000) - CALL 07dc:trace:module:MODULE_InitDLL (00000000043E0000,THREAD_DETACH,0000000000000000) - RETURN 1 [...] 05fc:trace:module:LdrGetDllFullName module 0000000000400000, name 0000000002EEE1D0. [...] 07dc:trace:module:MODULE_InitDLL (00000003AFD00000 L"imm32.dll",THREAD_DETACH,0000000000000000) - CALL 07dc:trace:module:MODULE_InitDLL (00000003AFD00000,THREAD_DETACH,0000000000000000) - RETURN 1 07dc:trace:module:MODULE_InitDLL (000000023D820000 L"user32.dll",THREAD_DETACH,0000000000000000) - CALL 05fc:err:sync:RtlpWaitForCriticalSection section 0000000170067640 "dlls/ntdll/loader.c: loader_section" wait timed out in thread 05fc, blocked by 07dc, retrying (60 sec) 07e0:err:sync:RtlpWaitForCriticalSection section 0000000170067640 "dlls/ntdll/loader.c: loader_section" wait timed out in thread 07e0, blocked by 07dc, retrying (60 sec) 05fc:trace:module:FindResourceExW 0000000000400000 #0006 #0ffd 0000 05fc:trace:module:LoadResource 0000000000400000 00000000026B6330 05fc:trace:module:FindResourceExW 000000007B600000 #0006 #0101 0809 05fc:trace:module:LoadResource 000000007B600000 000000007B65DD10 [...] 05fc:trace:module:FindResourceExW 000000023D820000 #0005 L"MSGBOX" 0009 05fc:trace:module:LoadResource 000000023D820000 000000023D8DD020 05fc:err:system:user_check_not_lock BUG: holding USER lock m Files\The Bat!\thebat64.exe: dlls/win32u/sysparams.c:385: user_check_not_lock: Assertion `0' failed. [...thread 5fc crashing due to deadlock detection...]
So a different thread has apparently exited, and so we called user32's DllMain for THREAD_DETACH, but DllMain never returned. This thread exit happened while we were in the middle of executing CURSORICON_Load - presumably this takes a lock for either user32 or NtUser which DllMain also needs to proceed. But calling DllMain takes the loader_section lock, which CURSORICON_Load will also require when it calls LdrGetDllFullName to fill in its own cursoricon_object structure...
Deadlock!
(Since the DllMain locking behaviour is prehistoric, this must be a change in the way user32 handles THREAD_DETACH between 6.21 and 7.5...)
https://bugs.winehq.org/show_bug.cgi?id=53014
--- Comment #3 from jswinebz@kanargh.org.uk --- (Or CURSORICON_Load is taking its lock for longer now...)
https://bugs.winehq.org/show_bug.cgi?id=53014
--- Comment #4 from jswinebz@kanargh.org.uk --- (Hmm. Unrelated, but it's a pity that "user32: Don't access cursoricon_object directly in CopyImage" by moving to call GetIconInfoExW forces us to make whole new bitmap copies that we specifically do not want in any of the 3 CopyImage code paths, because we'll always end up making another copy later...)
https://bugs.winehq.org/show_bug.cgi?id=53014
--- Comment #5 from jswinebz@kanargh.org.uk --- Ok, think I've nailed the deadlock.
When bits of user32's thread detach were moved into win32u, including the destroy_thread_windows() function, that function gained a call to user_lock().
The Bat! can animate both its systray icon, and the taskbar icon. The latter doesn't actually work any more for me, I believe due to gtk deciding that people shouldn't have access to that sort of UI decoration any more, but the Win32 side of things is still handled.
To animate the taskbar icon it ultimately sets the window class word containing the icon handle, which ends up in user32:class.c:set_class_long, case GCLP_HICON, itself under user_lock. This calls CopyImage, which can call either GetModuleHandleW or LdrGetDllFullName to access the icon resource information, both of which take the process lock (loader_section).
On the other side a thread is in DllMain having already taken the process lock, which blocks the thread calling CopyImage, then tries to call destroy_thread_windows(), deadlocking on user_lock held by the thread calling CopyImage.
So there I'm stuck: someone clearly thought adding user_lock() to destroy_thread_windows() was necessary, but taking additional locks within DllMain is a highly dangerous act and always has been. That lock must be removed again from destroy_thread_windows(), but that can't necessarily be done simply without addressing whatever other thing it was there to protect.
(There might be nothing else required and the addition was mere paranoia, but that requires bit more understanding of wine internals than I have.)
https://bugs.winehq.org/show_bug.cgi?id=53014
Jacek Caban jacek@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jacek@codeweavers.com
--- Comment #6 from Jacek Caban jacek@codeweavers.com --- Those CopyImage calls are wrong, PE calls should not hold user lock. I think we should fix it in class.c to make sure all those calls don't hold the lock.
BTW, the lock is nothing new, it was previously hold by user32 and is needed when accessing user_handles.
UNICODE_STRING Length handling really uses characters in this case, see related tests: https://source.winehq.org/git/wine.git/blob/HEAD:/dlls/win32u/tests/win32u.c... The module/res typo was fixed by Brendan recently: https://source.winehq.org/git/wine.git/commitdiff/b88ba537ea879772dc86fc8b81...
https://bugs.winehq.org/show_bug.cgi?id=53014
--- Comment #7 from jswinebz@kanargh.org.uk --- I refer you to https://docs.microsoft.com/en-us/windows/win32/api/subauth/ns-subauth-unicod...
Length
Specifies the length
MaximumLength
Specifies the total size, in bytes
There is no "in this case", UNICODE_STRING members are in *bytes* in *all cases*.
And there is clearly a bug relative to the patch I posted because half the code assumes bytes (and is correct) and half assumes characters (and it wrong).
There is just no question about that: that is the public windows interface.
https://bugs.winehq.org/show_bug.cgi?id=53014
--- Comment #8 from jswinebz@kanargh.org.uk --- As for the lock: it is entirely proper and reasonable, I guess, for class.c to hold the user_lock when doing such modifications. If it wants to. It is after all only interacting with other parts of user32.
What it not right is for DllMain to take that same lock within destroy_thread_windows: taking additional locks within DllMain is *bad* and *wrong* and has been well-known to be such for about 30 fucking years. Here is the reference for *that*:
https://docs.microsoft.com/en-us/windows/win32/dlls/dllmain https://docs.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-bes...
https://bugs.winehq.org/show_bug.cgi?id=53014
--- Comment #9 from jswinebz@kanargh.org.uk --- (In reply to jswinebz from comment #7)
Length
Specifies the length
Length
Specifies the length, in bytes
That should have read. The documentation is clear: that structure counts in *bytes* *always*.
https://bugs.winehq.org/show_bug.cgi?id=53014
--- Comment #10 from jswinebz@kanargh.org.uk --- PS: what exactly was the link to the /tests/ (line 377) code supposed to demonstrate?
358 memset( module, 0xcc, sizeof(module) ); 359 module_str.Buffer = module; 360 module_str.Length = 0xdead; 361 module_str.MaximumLength = sizeof(module); 362 memset( res_buf, 0xcc, sizeof(res_buf) ); 363 res_str.Buffer = res_buf; 364 res_str.Length = 0xdead; 365 res_str.MaximumLength = sizeof(res_buf); 366 ret = NtUserGetIconInfo( handle, &info, &module_str, &res_str, NULL, 0 ); 367 ok( ret, "NtUserGetIconInfo failed: %lu\n", GetLastError() ); 368 ok( info.fIcon == TRUE, "fIcon = %x\n", info.fIcon ); 369 ok( module_str.Length, "module_str.Length = 0\n" ); 370 ok( !res_str.Length, "res_str.Length = %u\n", res_str.Length ); 371 ok( module_str.Buffer == module, "module_str.Buffer = %p\n", module_str.Buffer ); 372 ok( res_str.Buffer == (WCHAR *)IDI_HAND, "res_str.Buffer = %p\n", res_str.Buffer ); 373 DeleteObject( info.hbmColor ); 374 DeleteObject( info.hbmMask ); 375 376 module[module_str.Length] = 0; 377 ok( GetModuleHandleW(module) == GetModuleHandleW(L"user32.dll"), 378 "GetIconInfoEx wrong module %s\n", wine_dbgstr_w(module) );
It certainly doesn't appear to show that "UNICODE_STRING Length handling really uses characters in this case, see related tests" because (well firstly it's not actually true, and...) all we're really showing is that the returned module name length is non-zero (not necessarily the *right* length in any situation), and that an icon loaded from user32.dll is associated with the same DLL as GetModuleHandle for user32.dll... which obviously ought to be true. And doesn't really demonstrate the bug between system and application DLLs.
https://bugs.winehq.org/show_bug.cgi?id=53014
--- Comment #11 from Nikolay Sivov bunglehead@gmail.com --- (In reply to jswinebz from comment #10)
PS: what exactly was the link to the /tests/ (line 377) code supposed to demonstrate?
...
375 376 module[module_str.Length] = 0; 377 ok( GetModuleHandleW(module) == GetModuleHandleW(L"user32.dll"), 378 "GetIconInfoEx wrong module %s\n", wine_dbgstr_w(module) );
It certainly doesn't appear to show that "UNICODE_STRING Length handling really uses characters in this case, see related tests"
It shows this in line 376. Running on Windows, Length field here is 6, corresponding to the string "USER32". If it was in bytes you'd get a bunch of trailing 0xcc chars, and that would fail module handle check.
https://bugs.winehq.org/show_bug.cgi?id=53014
Carlo Bramini carlo.bramix@libero.it changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |carlo.bramix@libero.it
--- Comment #12 from Carlo Bramini carlo.bramix@libero.it --- I confirm that NtUserGetIconInfo() modifies the content of the UNICODE_STRING structures for returning the number of characters and NOT the bytes, although the MSDN says that the Length field must be something different. So, the attached patch is wrong.
As you can see, I copied the content of GetIconInfoExW() into a tiny C project made with VisualStudio 2022 and I tried to execute it into the debugger. From attached screenshot, the result is evident. The same thing happens with both 32bit and 64bit executables.
Tests made with Windows 11 Pro 64bit.
https://bugs.winehq.org/show_bug.cgi?id=53014
--- Comment #13 from Carlo Bramini carlo.bramix@libero.it --- Created attachment 77319 --> https://bugs.winehq.org/attachment.cgi?id=77319 GetIconInfoExW running in Windows 11