https://bugs.winehq.org/show_bug.cgi?id=48482
Bug ID: 48482 Summary: Star Wars Knights of the Old Republic randomly crashes after failed malloc Product: Wine Version: 5.0-rc6 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: winelib Assignee: wine-bugs@winehq.org Reporter: info@fdossena.com Distribution: ArchLinux
Created attachment 66272 --> https://bugs.winehq.org/attachment.cgi?id=66272 winedbg output of the crash
I'm trying to play Star Wars Knights of the Old Republic (KOTOR, for short) and the game randomly crashes on loading screens when played in Wine.
The crash is a null pointer exception (see attached kotorcrash.txt). Nothing special appears in the terminal, the crash was captured using winedbg.
The issue can be easily replicated by saving in front of a loading door and going back and forth a few times. It usually happens after 5-10 loads, so during normal gameplay the game crashes every 30 minutes or so, depending on the area.
My reverse engineering skills are minimal, but I know how to use IDA a bit, so I took a peek at the offending code with it (see attached isassembled.png). The instruction can be reached from 2 paths, one of which contains a malloc that I think is failing and returning 0. Why it fails I cannot tell. I tried placing a breakpoint but it gets called too often to be able to play the game (I need some way to break only if it returns 0, but I don't know how to do it).
I think malloc is part of winelib, so I chose that as the component for this bug report, if it's wrong please move it to the correct section.
I am willing to help you debug the issue further if you can tell me exactly what do to. I can also provide game files or saved games to replicate the issue if you contact me via email.
Software: * Manjaro Linux 18.1.5 KDE x86-64 (also tested on Ubuntu 19.10) * KOTOR version 1.0.3 from GOG (also tested with Steam and disc versions) * Wine version 5.0-rc6 built from source (also tested with 4.0.3 stable, 5.0-rc2 and 5.0-rc2 staging from package manager)
Hardware 1: * AMD Athlon 300G * 8GB DDR4 * AMD RX Vega 3 graphics (using open source amdgpu driver)
Hardware 2: * Intel Core i9 9900k * 64GGB DDR4 * nVidia GTX 1080 (using both the open source nouveau driver and the proprietary one)
I've also tried playing in software rendering using Mesa Gallium on LLVMPipe, the issue was still present. Note: the game currently doesn't work on Intel graphics due to driver issues.
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #1 from Federico Dossena info@fdossena.com --- Created attachment 66273 --> https://bugs.winehq.org/attachment.cgi?id=66273 offending code
https://bugs.winehq.org/show_bug.cgi?id=48482
Federico Dossena info@fdossena.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Component|winelib |-unknown
https://bugs.winehq.org/show_bug.cgi?id=48482
joaopa jeremielapuree@yahoo.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jeremielapuree@yahoo.fr
--- Comment #2 from joaopa jeremielapuree@yahoo.fr --- Can you attach a saved game where the bug occurs. It will ease the way to reproduce the bug.
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #3 from Federico Dossena info@fdossena.com --- Created attachment 66276 --> https://bugs.winehq.org/attachment.cgi?id=66276 savegame where the bug can be triggered easily. Just keep going back and forth between the door
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #4 from Federico Dossena info@fdossena.com --- (In reply to joaopa from comment #2)
Can you attach a saved game where the bug occurs. It will ease the way to reproduce the bug.
Done
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #5 from joaopa jeremielapuree@yahoo.fr --- I confirm that the bug exists. Unfortunately the backtrace is very poor (even with winedbg)
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #6 from Federico Dossena info@fdossena.com --- (In reply to joaopa from comment #5)
I confirm that the bug exists. Unfortunately the backtrace is very poor (even with winedbg)
Is there anything I can do to investigate this further? Do you know how I can put a conditional breakpoint on that malloc so I can see when it's failing?
https://bugs.winehq.org/show_bug.cgi?id=48482
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com
--- Comment #7 from Zebediah Figura z.figura12@gmail.com --- Sounds likely the program is running out of virtual address space. There may or may not be anything we can do.
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #8 from Federico Dossena info@fdossena.com --- (In reply to Zebediah Figura from comment #7)
Sounds likely the program is running out of virtual address space. There may or may not be anything we can do.
Can you help me understand the issue? I noticed that the game's virtual memory is 3.2gb in size, but only a few hundred megabytes are actually used. Is this some kind of memory leak?
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #9 from Federico Dossena info@fdossena.com --- Would an strace be useful to investigate this issue? I've been trying to use ltrace but it won't attach to the kotor process, only to wineserver. I want to intercept mallocs and frees to see if there's anything interesting
https://bugs.winehq.org/show_bug.cgi?id=48482
Stefan Dösinger stefan@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |stefan@codeweavers.com
--- Comment #10 from Stefan Dösinger stefan@codeweavers.com --- 3.2 GB should still leave plenty of room. It could be that HeapAlloc (what msvcrt's malloc uses) fails because the heap structures are corrupted. Running with WINEDEBUG=warn+heap might give some clues.
Wine processes have a huge virtual memory footprint because of areas wine-preloader blocks early on to keep it available for Windows things that need to be at a certain address. After the regular wine main() function starts it might be too late and some Linux library blocks the .exe's load address.
Does the game executable set the IMAGE_FILE_LARGE_ADDRESS_AWARE flag? If not, Wine will block the entire area from 0x80000000-0xffffffff to behave like old 32 bit Windows that had a 2-2 memsplit and no userland pointer would ever have the highest bit set. Linux will see that as 2GB of address space being in use in the process...
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #11 from Matteo Bruni matteo.mystral@gmail.com --- (In reply to Federico Dossena from comment #8)
(In reply to Zebediah Figura from comment #7)
Sounds likely the program is running out of virtual address space. There may or may not be anything we can do.
Can you help me understand the issue? I noticed that the game's virtual memory is 3.2gb in size, but only a few hundred megabytes are actually used. Is this some kind of memory leak?
That suggests that something is allocating / reserving memory but not actually using it. I don't know that it tells us anything in particular though.
If the VIRT value goes up every time you go through the door then yeah, it does sound like a memory leak. It isn't immediately obvious what's leaking the memory though (e.g. is it the game itself or a specific Wine component?)
(In reply to Federico Dossena from comment #9)
Would an strace be useful to investigate this issue? I've been trying to use ltrace but it won't attach to the kotor process, only to wineserver. I want to intercept mallocs and frees to see if there's anything interesting
You could get a +heap trace, with additional channels (+wgl,+opengl ?) to figure out what's the source of those allocations. It's going to be pretty huge though. Another thing that might shed some light is /proc/<pid>/maps.
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #12 from Federico Dossena info@fdossena.com --- (In reply to Stefan Dösinger from comment #10)
3.2 GB should still leave plenty of room. It could be that HeapAlloc (what msvcrt's malloc uses) fails because the heap structures are corrupted. Running with WINEDEBUG=warn+heap might give some clues.
Wine processes have a huge virtual memory footprint because of areas wine-preloader blocks early on to keep it available for Windows things that need to be at a certain address. After the regular wine main() function starts it might be too late and some Linux library blocks the .exe's load address.
Does the game executable set the IMAGE_FILE_LARGE_ADDRESS_AWARE flag? If not, Wine will block the entire area from 0x80000000-0xffffffff to behave like old 32 bit Windows that had a 2-2 memsplit and no userland pointer would ever have the highest bit set. Linux will see that as 2GB of address space being in use in the process...
I tried setting WINEDEBUG to that value but it doesn't print anything. Do I need a debug build?
KOTOR is not large address aware. I tried forcing that flag but it doesn't delay the crash or anything.
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #13 from Stefan Dösinger stefan@codeweavers.com --- WINEDEBUG=warn+heap doesn't write anything unless it detects corruption. It enables some extra code that will fill freed Heap memory with 0xfeeefeee and newly allocated ones with 0xfeedfeed (or similar. not sure). It also verifies some extra canary values when operating on heap allocations.
That making it large address aware doesn't help suggests that the heap use is not the issue here. Is the virtual memory size still in the 3+ GB range at the time of the crash? Heap corruption is pretty unlikely too. warn+heap is not guaranteed to catch it, but very likely.
So next guess would be a race condition. Did you try to force it to one CPU core with taskset? (e.g. taskset -c 1 wine kotor.exe)
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #14 from Federico Dossena info@fdossena.com --- (In reply to Stefan Dösinger from comment #13)
WINEDEBUG=warn+heap doesn't write anything unless it detects corruption. It enables some extra code that will fill freed Heap memory with 0xfeeefeee and newly allocated ones with 0xfeedfeed (or similar. not sure). It also verifies some extra canary values when operating on heap allocations.
That making it large address aware doesn't help suggests that the heap use is not the issue here. Is the virtual memory size still in the 3+ GB range at the time of the crash? Heap corruption is pretty unlikely too. warn+heap is not guaranteed to catch it, but very likely.
So next guess would be a race condition. Did you try to force it to one CPU core with taskset? (e.g. taskset -c 1 wine kotor.exe)
Yes, I tried running the game with a single core and it doesn't fix it.
Since the game is statically linked, I made a very simple hack that doubles the requested memory when malloc is called (I can provide a file diff if you want). The game runs normally but obviously uses a bit more memory, however, it still crashes after a few load screens, at the same exact point, with an access violation instead of a null pointer.
So my initial findings might have been wrong (after all, reverse engineering really isn't my thing), and I think that this might be a use after free thing. I tried using valgrind to prove this, but it won't start for some reason. Is there any other way to debug this?
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #15 from Stefan Dösinger stefan@codeweavers.com --- Use after free should be caught by WINEDEBUG=warn+heap.
Add some extra ERR lines to HeapAlloc to make sure you're actually getting a NULL allocation back. If you do, you can see what the parameters are and why it is failing.
If you suspect use after free you can try to make HeapFree do nothing. You might die from out of memory though.
There are other alloc APIs, most importantly VirtualAlloc and GlobalAlloc. But the first one is usually not used for regular work allocations (but e.g. for allocating memory for dynamically generated code or hardware I/O) and GlobalAlloc is more a Win16 thing if I am not mistaken.
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #16 from Federico Dossena info@fdossena.com --- (In reply to Stefan Dösinger from comment #15)
Use after free should be caught by WINEDEBUG=warn+heap.
Add some extra ERR lines to HeapAlloc to make sure you're actually getting a NULL allocation back. If you do, you can see what the parameters are and why it is failing.
If you suspect use after free you can try to make HeapFree do nothing. You might die from out of memory though.
There are other alloc APIs, most importantly VirtualAlloc and GlobalAlloc. But the first one is usually not used for regular work allocations (but e.g. for allocating memory for dynamically generated code or hardware I/O) and GlobalAlloc is more a Win16 thing if I am not mistaken.
The game seems to use the same malloc and free functions, so I was able to modify them.
I'm attaching a 7z file containing the diffs files to apply to swkotor.exe to do the double mallocs and to disable the free function.
The game still crashes at the same location, so I guess it has nothing to do with memory allocations. I noticed 2 things however: the crash is always at the end of the loading, which is where some large textures are allocated and pbuffers are used. This used to be a problem with Mesa, although they seem to have fixed it over a year ago. In the terminal it says something about a WGL function being a partial stub, could this be the problem?
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #17 from Federico Dossena info@fdossena.com --- Created attachment 66299 --> https://bugs.winehq.org/attachment.cgi?id=66299 .dif files for swkotor.exe
.dif files for swkotor.exe
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #18 from Federico Dossena info@fdossena.com --- Created attachment 66300 --> https://bugs.winehq.org/attachment.cgi?id=66300 Modified swkotor.exe to alter malloc and free behavior
I'm also adding modified exe files in case your swkotor.exe is not the same as mine.
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #19 from Alexandre Julliard julliard@winehq.org --- The content of attachment 66300 has been deleted for the following reason:
Please don't attach copyrighted binaries
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #20 from joaopa jeremielapuree@yahoo.fr --- Looks like the bug is fixed with wine-7.0-rc3. Can anyone confirm it is the case actually?
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #21 from Federico Dossena info@fdossena.com --- It turned out to be a mesa bug, they fixed it around july
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #22 from joaopa jeremielapuree@yahoo.fr --- So, can you close this as NOTOURBUG?
https://bugs.winehq.org/show_bug.cgi?id=48482
--- Comment #23 from Nikolay Sivov bunglehead@gmail.com --- What would be useful is a commit or bug link to corresponding mesa issue.
https://bugs.winehq.org/show_bug.cgi?id=48482
Federico Dossena info@fdossena.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |NOTOURBUG
--- Comment #24 from Federico Dossena info@fdossena.com --- It was a mesa bug, fixed around Version 21.2 (see force_gl_names_reuse)
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11527
https://bugs.winehq.org/show_bug.cgi?id=48482
Paul Gofman pgofman@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |pgofman@codeweavers.com
--- Comment #25 from Paul Gofman pgofman@codeweavers.com --- Strictly speaking, it was an app bug depending on undefined behaviour (texture names always being small enough numbers) which used to work in GL as texture numbers used to be allocated sequentially and reused in the majority of GL implementations / drivers.
https://bugs.winehq.org/show_bug.cgi?id=48482
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #26 from Austin English austinenglish@gmail.com --- In any case, now fixed, so closing.