[Bug 48482] New: Star Wars Knights of the Old Republic randomly crashes after failed malloc
https://bugs.winehq.org/show_bug.cgi?id=48482 Bug ID: 48482 Summary: Star Wars Knights of the Old Republic randomly crashes after failed malloc Product: Wine Version: 5.0-rc6 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: winelib Assignee: wine-bugs(a)winehq.org Reporter: info(a)fdossena.com Distribution: ArchLinux Created attachment 66272 --> https://bugs.winehq.org/attachment.cgi?id=66272 winedbg output of the crash I'm trying to play Star Wars Knights of the Old Republic (KOTOR, for short) and the game randomly crashes on loading screens when played in Wine. The crash is a null pointer exception (see attached kotorcrash.txt). Nothing special appears in the terminal, the crash was captured using winedbg. The issue can be easily replicated by saving in front of a loading door and going back and forth a few times. It usually happens after 5-10 loads, so during normal gameplay the game crashes every 30 minutes or so, depending on the area. My reverse engineering skills are minimal, but I know how to use IDA a bit, so I took a peek at the offending code with it (see attached isassembled.png). The instruction can be reached from 2 paths, one of which contains a malloc that I think is failing and returning 0. Why it fails I cannot tell. I tried placing a breakpoint but it gets called too often to be able to play the game (I need some way to break only if it returns 0, but I don't know how to do it). I think malloc is part of winelib, so I chose that as the component for this bug report, if it's wrong please move it to the correct section. I am willing to help you debug the issue further if you can tell me exactly what do to. I can also provide game files or saved games to replicate the issue if you contact me via email. Software: * Manjaro Linux 18.1.5 KDE x86-64 (also tested on Ubuntu 19.10) * KOTOR version 1.0.3 from GOG (also tested with Steam and disc versions) * Wine version 5.0-rc6 built from source (also tested with 4.0.3 stable, 5.0-rc2 and 5.0-rc2 staging from package manager) Hardware 1: * AMD Athlon 300G * 8GB DDR4 * AMD RX Vega 3 graphics (using open source amdgpu driver) Hardware 2: * Intel Core i9 9900k * 64GGB DDR4 * nVidia GTX 1080 (using both the open source nouveau driver and the proprietary one) I've also tried playing in software rendering using Mesa Gallium on LLVMPipe, the issue was still present. Note: the game currently doesn't work on Intel graphics due to driver issues. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #1 from Federico Dossena <info(a)fdossena.com> --- Created attachment 66273 --> https://bugs.winehq.org/attachment.cgi?id=66273 offending code -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 Federico Dossena <info(a)fdossena.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|winelib |-unknown -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 joaopa <jeremielapuree(a)yahoo.fr> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jeremielapuree(a)yahoo.fr --- Comment #2 from joaopa <jeremielapuree(a)yahoo.fr> --- Can you attach a saved game where the bug occurs. It will ease the way to reproduce the bug. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #3 from Federico Dossena <info(a)fdossena.com> --- Created attachment 66276 --> https://bugs.winehq.org/attachment.cgi?id=66276 savegame where the bug can be triggered easily. Just keep going back and forth between the door -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #4 from Federico Dossena <info(a)fdossena.com> --- (In reply to joaopa from comment #2)
Can you attach a saved game where the bug occurs. It will ease the way to reproduce the bug.
Done -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #5 from joaopa <jeremielapuree(a)yahoo.fr> --- I confirm that the bug exists. Unfortunately the backtrace is very poor (even with winedbg) -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #6 from Federico Dossena <info(a)fdossena.com> --- (In reply to joaopa from comment #5)
I confirm that the bug exists. Unfortunately the backtrace is very poor (even with winedbg)
Is there anything I can do to investigate this further? Do you know how I can put a conditional breakpoint on that malloc so I can see when it's failing? -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 Zebediah Figura <z.figura12(a)gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12(a)gmail.com --- Comment #7 from Zebediah Figura <z.figura12(a)gmail.com> --- Sounds likely the program is running out of virtual address space. There may or may not be anything we can do. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #8 from Federico Dossena <info(a)fdossena.com> --- (In reply to Zebediah Figura from comment #7)
Sounds likely the program is running out of virtual address space. There may or may not be anything we can do.
Can you help me understand the issue? I noticed that the game's virtual memory is 3.2gb in size, but only a few hundred megabytes are actually used. Is this some kind of memory leak? -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #9 from Federico Dossena <info(a)fdossena.com> --- Would an strace be useful to investigate this issue? I've been trying to use ltrace but it won't attach to the kotor process, only to wineserver. I want to intercept mallocs and frees to see if there's anything interesting -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 Stefan Dösinger <stefan(a)codeweavers.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |stefan(a)codeweavers.com --- Comment #10 from Stefan Dösinger <stefan(a)codeweavers.com> --- 3.2 GB should still leave plenty of room. It could be that HeapAlloc (what msvcrt's malloc uses) fails because the heap structures are corrupted. Running with WINEDEBUG=warn+heap might give some clues. Wine processes have a huge virtual memory footprint because of areas wine-preloader blocks early on to keep it available for Windows things that need to be at a certain address. After the regular wine main() function starts it might be too late and some Linux library blocks the .exe's load address. Does the game executable set the IMAGE_FILE_LARGE_ADDRESS_AWARE flag? If not, Wine will block the entire area from 0x80000000-0xffffffff to behave like old 32 bit Windows that had a 2-2 memsplit and no userland pointer would ever have the highest bit set. Linux will see that as 2GB of address space being in use in the process... -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #11 from Matteo Bruni <matteo.mystral(a)gmail.com> --- (In reply to Federico Dossena from comment #8)
(In reply to Zebediah Figura from comment #7)
Sounds likely the program is running out of virtual address space. There may or may not be anything we can do.
Can you help me understand the issue? I noticed that the game's virtual memory is 3.2gb in size, but only a few hundred megabytes are actually used. Is this some kind of memory leak?
That suggests that something is allocating / reserving memory but not actually using it. I don't know that it tells us anything in particular though. If the VIRT value goes up every time you go through the door then yeah, it does sound like a memory leak. It isn't immediately obvious what's leaking the memory though (e.g. is it the game itself or a specific Wine component?) (In reply to Federico Dossena from comment #9)
Would an strace be useful to investigate this issue? I've been trying to use ltrace but it won't attach to the kotor process, only to wineserver. I want to intercept mallocs and frees to see if there's anything interesting
You could get a +heap trace, with additional channels (+wgl,+opengl ?) to figure out what's the source of those allocations. It's going to be pretty huge though. Another thing that might shed some light is /proc/<pid>/maps. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #12 from Federico Dossena <info(a)fdossena.com> --- (In reply to Stefan Dösinger from comment #10)
3.2 GB should still leave plenty of room. It could be that HeapAlloc (what msvcrt's malloc uses) fails because the heap structures are corrupted. Running with WINEDEBUG=warn+heap might give some clues.
Wine processes have a huge virtual memory footprint because of areas wine-preloader blocks early on to keep it available for Windows things that need to be at a certain address. After the regular wine main() function starts it might be too late and some Linux library blocks the .exe's load address.
Does the game executable set the IMAGE_FILE_LARGE_ADDRESS_AWARE flag? If not, Wine will block the entire area from 0x80000000-0xffffffff to behave like old 32 bit Windows that had a 2-2 memsplit and no userland pointer would ever have the highest bit set. Linux will see that as 2GB of address space being in use in the process...
I tried setting WINEDEBUG to that value but it doesn't print anything. Do I need a debug build? KOTOR is not large address aware. I tried forcing that flag but it doesn't delay the crash or anything. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #13 from Stefan Dösinger <stefan(a)codeweavers.com> --- WINEDEBUG=warn+heap doesn't write anything unless it detects corruption. It enables some extra code that will fill freed Heap memory with 0xfeeefeee and newly allocated ones with 0xfeedfeed (or similar. not sure). It also verifies some extra canary values when operating on heap allocations. That making it large address aware doesn't help suggests that the heap use is not the issue here. Is the virtual memory size still in the 3+ GB range at the time of the crash? Heap corruption is pretty unlikely too. warn+heap is not guaranteed to catch it, but very likely. So next guess would be a race condition. Did you try to force it to one CPU core with taskset? (e.g. taskset -c 1 wine kotor.exe) -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #14 from Federico Dossena <info(a)fdossena.com> --- (In reply to Stefan Dösinger from comment #13)
WINEDEBUG=warn+heap doesn't write anything unless it detects corruption. It enables some extra code that will fill freed Heap memory with 0xfeeefeee and newly allocated ones with 0xfeedfeed (or similar. not sure). It also verifies some extra canary values when operating on heap allocations.
That making it large address aware doesn't help suggests that the heap use is not the issue here. Is the virtual memory size still in the 3+ GB range at the time of the crash? Heap corruption is pretty unlikely too. warn+heap is not guaranteed to catch it, but very likely.
So next guess would be a race condition. Did you try to force it to one CPU core with taskset? (e.g. taskset -c 1 wine kotor.exe)
Yes, I tried running the game with a single core and it doesn't fix it. Since the game is statically linked, I made a very simple hack that doubles the requested memory when malloc is called (I can provide a file diff if you want). The game runs normally but obviously uses a bit more memory, however, it still crashes after a few load screens, at the same exact point, with an access violation instead of a null pointer. So my initial findings might have been wrong (after all, reverse engineering really isn't my thing), and I think that this might be a use after free thing. I tried using valgrind to prove this, but it won't start for some reason. Is there any other way to debug this? -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #15 from Stefan Dösinger <stefan(a)codeweavers.com> --- Use after free should be caught by WINEDEBUG=warn+heap. Add some extra ERR lines to HeapAlloc to make sure you're actually getting a NULL allocation back. If you do, you can see what the parameters are and why it is failing. If you suspect use after free you can try to make HeapFree do nothing. You might die from out of memory though. There are other alloc APIs, most importantly VirtualAlloc and GlobalAlloc. But the first one is usually not used for regular work allocations (but e.g. for allocating memory for dynamically generated code or hardware I/O) and GlobalAlloc is more a Win16 thing if I am not mistaken. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #16 from Federico Dossena <info(a)fdossena.com> --- (In reply to Stefan Dösinger from comment #15)
Use after free should be caught by WINEDEBUG=warn+heap.
Add some extra ERR lines to HeapAlloc to make sure you're actually getting a NULL allocation back. If you do, you can see what the parameters are and why it is failing.
If you suspect use after free you can try to make HeapFree do nothing. You might die from out of memory though.
There are other alloc APIs, most importantly VirtualAlloc and GlobalAlloc. But the first one is usually not used for regular work allocations (but e.g. for allocating memory for dynamically generated code or hardware I/O) and GlobalAlloc is more a Win16 thing if I am not mistaken.
The game seems to use the same malloc and free functions, so I was able to modify them. I'm attaching a 7z file containing the diffs files to apply to swkotor.exe to do the double mallocs and to disable the free function. The game still crashes at the same location, so I guess it has nothing to do with memory allocations. I noticed 2 things however: the crash is always at the end of the loading, which is where some large textures are allocated and pbuffers are used. This used to be a problem with Mesa, although they seem to have fixed it over a year ago. In the terminal it says something about a WGL function being a partial stub, could this be the problem? -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #17 from Federico Dossena <info(a)fdossena.com> --- Created attachment 66299 --> https://bugs.winehq.org/attachment.cgi?id=66299 .dif files for swkotor.exe .dif files for swkotor.exe -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #18 from Federico Dossena <info(a)fdossena.com> --- Created attachment 66300 --> https://bugs.winehq.org/attachment.cgi?id=66300 Modified swkotor.exe to alter malloc and free behavior I'm also adding modified exe files in case your swkotor.exe is not the same as mine. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #19 from Alexandre Julliard <julliard(a)winehq.org> --- The content of attachment 66300 has been deleted for the following reason: Please don't attach copyrighted binaries -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #20 from joaopa <jeremielapuree(a)yahoo.fr> --- Looks like the bug is fixed with wine-7.0-rc3. Can anyone confirm it is the case actually? -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #21 from Federico Dossena <info(a)fdossena.com> --- It turned out to be a mesa bug, they fixed it around july -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #22 from joaopa <jeremielapuree(a)yahoo.fr> --- So, can you close this as NOTOURBUG? -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 --- Comment #23 from Nikolay Sivov <bunglehead(a)gmail.com> --- What would be useful is a commit or bug link to corresponding mesa issue. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 Federico Dossena <info(a)fdossena.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |NOTOURBUG --- Comment #24 from Federico Dossena <info(a)fdossena.com> --- It was a mesa bug, fixed around Version 21.2 (see force_gl_names_reuse) https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11527 -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 Paul Gofman <pgofman(a)codeweavers.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |pgofman(a)codeweavers.com --- Comment #25 from Paul Gofman <pgofman(a)codeweavers.com> --- Strictly speaking, it was an app bug depending on undefined behaviour (texture names always being small enough numbers) which used to work in GL as texture numbers used to be allocated sequentially and reused in the majority of GL implementations / drivers. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
https://bugs.winehq.org/show_bug.cgi?id=48482 Austin English <austinenglish(a)gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED --- Comment #26 from Austin English <austinenglish(a)gmail.com> --- In any case, now fixed, so closing. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
participants (1)
-
WineHQ Bugzilla