https://bugs.winehq.org/show_bug.cgi?id=52213
Bug ID: 52213 Summary: Tests for mfreadwrite crash after pthread_exit Product: Wine Version: 7.0-rc1 Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: -unknown Assignee: wine-bugs@winehq.org Reporter: gmascellani@codeweavers.com Regression SHA1: 4ba31162c37ea237763e650f6242535d86ffb170 Distribution: Debian
Since commit 4ba31162c37ea237763e650f6242535d86ffb170 test for mfreadwrite crash in a funny way on my system: one of the threads of the tests crashes after calling pthread_exit, which is, as far as I understand, the last place in which we have the control of a dying thread.
The system on which I am trying this is a rather recently installed Debian unstable amd64, for which I did not do any specific tinkering, say, of the dynamic loader, glibc or anything else. So I can't think of anything unusual that could trigger this on my system. However, my first impression is that somehow we corrupt something in the pthread or glibc status so that the teardown goes badly.
https://bugs.winehq.org/show_bug.cgi?id=52213
--- Comment #1 from Giovanni Mascellani gmascellani@codeweavers.com --- Created attachment 71274 --> https://bugs.winehq.org/attachment.cgi?id=71274 Patch for testing
https://bugs.winehq.org/show_bug.cgi?id=52213
--- Comment #2 from Giovanni Mascellani gmascellani@codeweavers.com --- Created attachment 71275 --> https://bugs.winehq.org/attachment.cgi?id=71275 Trace with the debugging patch
https://bugs.winehq.org/show_bug.cgi?id=52213
--- Comment #3 from Giovanni Mascellani gmascellani@codeweavers.com --- Applying https://bugs.winehq.org/attachment.cgi?id=71274 you can obtain a trace similar to https://bugs.winehq.org/attachment.cgi?id=71275. As you can see, thread 108 calls pthread_exit and later receives a couple of segfaults.
https://bugs.winehq.org/show_bug.cgi?id=52213
Giovanni Mascellani gmascellani@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |gmascellani@codeweavers.com
https://bugs.winehq.org/show_bug.cgi?id=52213
Giovanni Mascellani gmascellani@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #71274|0 |1 is obsolete| |
--- Comment #4 from Giovanni Mascellani gmascellani@codeweavers.com --- Created attachment 71292 --> https://bugs.winehq.org/attachment.cgi?id=71292 New debugging patch
https://bugs.winehq.org/show_bug.cgi?id=52213
Giovanni Mascellani gmascellani@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #71275|0 |1 is obsolete| |
--- Comment #5 from Giovanni Mascellani gmascellani@codeweavers.com --- Created attachment 71293 --> https://bugs.winehq.org/attachment.cgi?id=71293 Trace for amd64
https://bugs.winehq.org/show_bug.cgi?id=52213
--- Comment #6 from Giovanni Mascellani gmascellani@codeweavers.com --- Created attachment 71294 --> https://bugs.winehq.org/attachment.cgi?id=71294 Trace for i386
https://bugs.winehq.org/show_bug.cgi?id=52213
--- Comment #7 from Giovanni Mascellani gmascellani@codeweavers.com --- I updated the debugging patch so that the crash is displayed in a minimal example, without involving winegstreamer which is not the real problem here.
The crash seems to happen because pthread_exit() is called inside a signal handler (which is executed in an alternate stack). Basically, the only thing pthread_exit() is supposed to do is to launch an internal exception to unwind the stack, and it might be unable to properly handle being on an alternate stack.
The new debugging patch has a sleep() call before exit(), because otherwise the exit_group() call might arrive before SIGSEGV is delivered and the segmentation fault might be hidden. With this sleep() call the crash is consistently reproducible on my system.
https://bugs.winehq.org/show_bug.cgi?id=52213
Giovanni Mascellani gmascellani@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Tests for mfreadwrite crash |Thread crashes because |after pthread_exit |pthread_exit is called in a | |signal handler
https://bugs.winehq.org/show_bug.cgi?id=52213
--- Comment #8 from Giovanni Mascellani gmascellani@codeweavers.com --- Created attachment 71295 --> https://bugs.winehq.org/attachment.cgi?id=71295 Crash demonstration on a pure-Linux program
Compile with -pthreads.
https://bugs.winehq.org/show_bug.cgi?id=52213
Giovanni Mascellani gmascellani@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #71295|0 |1 is obsolete| |
--- Comment #9 from Giovanni Mascellani gmascellani@codeweavers.com --- Created attachment 71296 --> https://bugs.winehq.org/attachment.cgi?id=71296 Attempt to reproduce the crash without Wine
(previous version had a too small stack)
https://bugs.winehq.org/show_bug.cgi?id=52213
--- Comment #10 from Giovanni Mascellani gmascellani@codeweavers.com --- I tried to reproduce the same crash in a pure C program (without all the Wine instrumentation) and it doesn't happen: the program creates a thread, installs an alternate signal stack, waits on a variable condition, is killed with SIGQUIT and calls pthread_exit, and all of this seems to work properly. So it seems that there is something Wine-specific leading to the crash.
https://bugs.winehq.org/show_bug.cgi?id=52213
Giovanni Mascellani gmascellani@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Thread crashes because |Thread crashes when |pthread_exit is called in a |pthread_exit is called in a |signal handler |SIGQUIT handler
https://bugs.winehq.org/show_bug.cgi?id=52213
--- Comment #11 from Giovanni Mascellani gmascellani@codeweavers.com --- I think I'm getting an idea of what is happening here: when a thread is started, its stack pointer is stored in amd64_thread_data()->exit_frame by signal_start_thread() and syscall_frame is used whenever the execution flow is in a UNIX library. When exiting a thread, signal_exit_thread() resets the stack to the exit_frame and then calls pthread_exit().
The problem is that pthread_exit() doesn't like this arrangement: basically the only thing pthread_exit() does is unwinding the thread's stack, so that all cleanup procedures, registered for example with pthread_cleanup_push(), are executed. But since we already unwound the stack under its feet (and possibly rewrote a part of it inside the SIGQUIT handler), this doesn't work anymore.
In the particular case on my test program, inside pthread_cond_wait() a cleanup procedure is established (to properly abort the wait if the stack is unwound), and since its address is overwritten inside the SIGQUIT handler, a segmentation fault happens.
I am not sure of what is the correct way to fix this problem. I'd be tempted to say that we can do away with exit_frame and just let pthread unwind the stack, but I am not sure what exit_frame is supposed to help with. Why aren't we using the stack allocated by pthread, instead of transitioning to our own syscall stack?
https://bugs.winehq.org/show_bug.cgi?id=52213
Rémi Bernon rbernon@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |rbernon@codeweavers.com
--- Comment #12 from Rémi Bernon rbernon@codeweavers.com --- Although there's possibly some stack smashing happening, I believe the main issue that makes pthread_exit fail to unwind is that it loses track of the stack pointer in the syscall frame, as we don't have .cfi instructions there and as we're overwriting all the registers and swapping stack pointers.
I think we could let pthread unwind the unix-side stack, by making sure the .cfi instructions point to the unix-side frames only. I implemented such a change and sent it to the M-L as https://source.winehq.org/patches/data/225920, https://source.winehq.org/patches/data/225921, https://source.winehq.org/patches/data/225922, and https://source.winehq.org/patches/data/225923. Somehow what libunwind really (and only) needs is to find %rip, although I'm not completely sure why is that.
https://bugs.winehq.org/show_bug.cgi?id=52213
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |fgouget@codeweavers.com
--- Comment #13 from Alexandre Julliard julliard@winehq.org --- *** Bug 55283 has been marked as a duplicate of this bug. ***
https://bugs.winehq.org/show_bug.cgi?id=52213
--- Comment #14 from Zeb Figura z.figura12@gmail.com --- *** Bug 55283 has been marked as a duplicate of this bug. ***
https://bugs.winehq.org/show_bug.cgi?id=52213
--- Comment #15 from Zeb Figura z.figura12@gmail.com --- *** Bug 54095 has been marked as a duplicate of this bug. ***
https://bugs.winehq.org/show_bug.cgi?id=52213
Michael Stefaniuc mstefani@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Status|UNCONFIRMED |NEW CC| |mstefani@winehq.org
--- Comment #16 from Michael Stefaniuc mstefani@winehq.org --- For me it is winealsa.drv that produces the crashes in some of the DMusic tests. Disabling winealsa.drv via winecfg works around the issue.
https://bugs.winehq.org/show_bug.cgi?id=52213
--- Comment #17 from Michael Stefaniuc mstefani@winehq.org --- Oh, and I experience the crashes in current devel (for a couple of versions now) but not in 8.0.x. Is there any value in running a regression test for that? This bug pre-dates 8.0.
https://bugs.winehq.org/show_bug.cgi?id=52213
--- Comment #18 from Rémi Bernon rbernon@codeweavers.com --- Probably not, the outcome is random and depends on the stack layout and stale frames, it most often crashes but can also loop forever or anything.
https://bugs.winehq.org/show_bug.cgi?id=52213
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |matthew.mcallister.0@gmail. | |com
--- Comment #19 from Zeb Figura z.figura12@gmail.com --- *** Bug 55641 has been marked as a duplicate of this bug. ***
https://bugs.winehq.org/show_bug.cgi?id=52213
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Regression SHA1|4ba31162c37ea237763e650f624 | |2535d86ffb170 |
https://bugs.winehq.org/show_bug.cgi?id=52213
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |source, testcase
https://bugs.winehq.org/show_bug.cgi?id=52213
Brendan McGrath brendan@redmandi.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |brendan@redmandi.com
--- Comment #20 from Brendan McGrath brendan@redmandi.com --- *** Bug 53556 has been marked as a duplicate of this bug. ***
https://bugs.winehq.org/show_bug.cgi?id=52213
Julian Rüger jr98@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jr98@gmx.net
https://bugs.winehq.org/show_bug.cgi?id=52213
Rémi Bernon rbernon@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|NEW |RESOLVED
--- Comment #21 from Rémi Bernon rbernon@codeweavers.com --- This is fixed since some time now.
https://bugs.winehq.org/show_bug.cgi?id=52213
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #22 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 9.4.