https://bugs.winehq.org/show_bug.cgi?id=51442
Bug ID: 51442 Summary: A networking application misbehaves and causes 100% CPU usage in wineserver Product: Wine Version: 6.12 Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: winsock Assignee: wine-bugs@winehq.org Reporter: rpisl@seznam.cz Distribution: ---
Created attachment 70293 --> https://bugs.winehq.org/attachment.cgi?id=70293 WINEDEBUG=+winsock trace BAD
A networking application misbehaves and causes 100% CPU usage in wineserver since commit 414b31bc0bbbfe005e90a1946a649082dc303c55 so it is a regression. Reverting commits 414b31bc0bbbfe005e90a1946a649082dc303c55 and 1ccab719ee6e87b7399876d4d5b30eb889c49e32 makes it working again.
The setup is complex, but I'll try to provide useful information, testing and extract a minimal reproducible setup eventually.
https://bugs.winehq.org/show_bug.cgi?id=51442
Roman Pišl rpisl@seznam.cz changed:
What |Removed |Added ---------------------------------------------------------------------------- Distribution|--- |Debian Regression SHA1| |414b31bc0bbbfe005e90a1946a6 | |49082dc303c55 CC| |z.figura12@gmail.com Keywords| |regression
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #1 from Roman Pišl rpisl@seznam.cz --- Created attachment 70294 --> https://bugs.winehq.org/attachment.cgi?id=70294 WINEDEBUG=+winsock trace GOOD
Same run with the two commits reverted.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #2 from Zebediah Figura z.figura12@gmail.com --- Should hopefully be fixed by https://source.winehq.org/git/wine.git/commitdiff/361435f6095f8c759979600b06ed28785e7b3aec or https://source.winehq.org/git/wine.git/commitdiff/9bc5bc7c6628a69cef6e64facb8eb7e3cf2e269b; please retest with current git or with wine 6.14 when it is released.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #3 from Roman Pišl rpisl@seznam.cz --- Unfortunately that is not the case. I tried git head with complete rebuild but it is still the same. Still it is needed to revert the two patches to fix the regression and to have the same behavior as on Windows and to wineserver not to take 100% CPU.
I'm aware that information from my side is insufficient. Please give me some time to prepare a publishable example to reproduce this.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #4 from Zebediah Figura z.figura12@gmail.com --- A +winsock,+server trace would also work, actually.
https://bugs.winehq.org/show_bug.cgi?id=51442
ihxy ayafcc@163.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |ayafcc@163.com
--- Comment #5 from ihxy ayafcc@163.com --- I use the wework in Wine6.14 with the same problem.
https://bugs.winehq.org/show_bug.cgi?id=51442
Roman Pišl rpisl@seznam.cz changed:
What |Removed |Added ---------------------------------------------------------------------------- URL| |https://download.rexcontrol | |s.cz/files/test/wine-bug514 | |42-reproduce.zip
--- Comment #6 from Roman Pišl rpisl@seznam.cz --- This is a race condition that is hard to debug. At least I prepared test that reproduces the problem. Unfortunately it is rather complex so far.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #7 from Roman Pišl rpisl@seznam.cz --- Also I occasionally encounter following error with cmake+clang compilation under wine:
sendmsg: An operation was attempted on something that is not a socket.
May be it is related? Somewhere WSAENOTSOCK is returned erroneously?
https://bugs.winehq.org/show_bug.cgi?id=51442
Roman Pišl rpisl@seznam.cz changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED
--- Comment #8 from Roman Pišl rpisl@seznam.cz --- This may be as well a hidden bug in the application, don't spare precious time with that. I will eventually prepare a simple test case if it turns out to be a Wine bug.
https://bugs.winehq.org/show_bug.cgi?id=51442
Roman Pišl rpisl@seznam.cz changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|INVALID |--- Status|RESOLVED |UNCONFIRMED Summary|A networking application |Socket connection is not |misbehaves and causes 100% |established properly |CPU usage in wineserver |
--- Comment #9 from Roman Pišl rpisl@seznam.cz --- I am reopening this as I probably found a valid Wine trace log. The bug was previously called "A networking application misbehaves and causes 100% CPU usage in wineserver" but the CPU usage was was fixed with later commits and is no longer the case with a fresh wineprefix.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #10 from Roman Pišl rpisl@seznam.cz --- Created attachment 71313 --> https://bugs.winehq.org/attachment.cgi?id=71313 WINEDEBUG=+winsock trace with comments
This is an output of WINEDEBUG=+winsock with my attempt to resolve what is going on, hopefully valid.
https://bugs.winehq.org/show_bug.cgi?id=51442
Gabriel Ivăncescu gabrielopcode@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |gabrielopcode@gmail.com
--- Comment #11 from Gabriel Ivăncescu gabrielopcode@gmail.com --- This is a genuine regression. It happens pretty consistently in Firefox or Pale Moon (I tested 32-bit versions only).
To reproduce, just go to mail.google.com, possibly login to your gmail account, browse some mail and the categories on the left. At some point, it will stop loading as if you're offline. When this happens, no other connections will work; you can attempt to go to any other website on the URL bar and it will not connect, but hang indefinitely, once this bug is triggered.
Sometimes, it starts hanging as soon as mail.google.com is loaded, and then of course no other connection works anymore, but that happens mostly on Firefox rather than Pale Moon. By "hang" I mean the connections hang, not the rest of the browser.
I've bisected it to this exact commit, but unfortunately my skills in this area are lacking so I can't really figure out what's wrong.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #12 from Roman Pišl rpisl@seznam.cz --- (In reply to Gabriel Ivăncescu from comment #11)
This is a genuine regression. It happens pretty consistently in Firefox or Pale Moon (I tested 32-bit versions only).
Good to hear that there is another way to reproduce.
Sometimes, it starts hanging as soon as mail.google.com is loaded, and then of course no other connection works anymore, but that happens mostly on Firefox rather than Pale Moon. By "hang" I mean the connections hang, not the rest of the browser.
I observe the same symptoms. Also sometimes (but not always) all new socket connections are broken until wineserver is restarted.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #13 from Zebediah Figura z.figura12@gmail.com --- The minimal test application would probably be easier to debug, but it seems that the link is dead.
https://bugs.winehq.org/show_bug.cgi?id=51442
Roman Pišl rpisl@seznam.cz changed:
What |Removed |Added ---------------------------------------------------------------------------- URL|https://download.rexcontrol | |s.cz/files/test/wine-bug514 | |42-reproduce.zip |
--- Comment #14 from Roman Pišl rpisl@seznam.cz --- Hi Zebediah, I tried recent Firefox as mentioned in comment 11 and it seems to be the same case. If that doesn't help, I'll prepare a simpler testcase that reproduces what was posted in my comment 10 in following days.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #15 from Roman Pišl rpisl@seznam.cz --- It seems that bug 51648 is a duplicate of this bug. Loading youtube page works until 414b31bc0bbbfe005e90a1946a649082dc303c55 and still fails with git master (playing a video doesn't but that's a different bug).
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #16 from Zebediah Figura z.figura12@gmail.com --- FWIW, I'm not perfectly convinced that the test application and firefox suffer from the same bug either. They might be, but I'll believe it when I see it. There are a lot of ways for socket connection to go wrong.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #17 from Roman Pišl rpisl@seznam.cz --- This bug can be triggered quite reliably without hitting other bugs with:
firefox.exe -private -devtools https://www.phoronix.com
It is very likely that at least one connection remains stalled a no other content can be downloaded since then.
I'm also working on a simple and reliable example to reproduce the bug but without success so far.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #18 from Zebediah Figura z.figura12@gmail.com --- The Firefox hang appears to be due to stack corruption, from passing a fd_set that is larger than FD_SETSIZE. Since I don't see any such symptoms in the log from comment 10, I'm going to assume that it's a different bug, and I've filed bug 52302 accordingly.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #19 from Roman Pišl rpisl@seznam.cz --- Hi Zabediah, you were right - the issue is not the same. But thanks to your recent fixes, it starts to clarify! The problem is when connecting a socket in non-blocking mode. It fails multiple times with WSAECONNREFUSED (10061), but why if on localhost? Then, if it succeeds, the socket is switched back to blocking mode by the app but is never marked as ready for writing. I'll compare the behavior with Windows and prepare some test after the weekend.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #20 from Roman Pišl rpisl@seznam.cz --- What the occasionally failing program does:
1. Spawns other process (that takes some time to initialize and finally listens on local TCP socket) 2. Makes the socket non-blocking - ioctlsocket(fd, FIONBIO, 1) 3. Connects to #1, tests result, ok or error -> done, if WSAEWOULDBLOCK: 4. select(.., NULL, wfdset, timeout) 5. getsockopt(fd, SOL_SOCKET, SO_ERROR, ..) 6. SO_ERROR!=0 && SO_ERROR!=WSAEWOULDBLOCK -> error 7. fd ready for write -> #7 else -> #4 8. Makes the socket blocking - ioctlsocket(fd, FIONBIO, 0) 9. select(.., NULL, wfdset, ..) 10. send()
This sometimes runs to 9 but loops there forever, the socket is never marked as ready for write again.
Performing #5+#6 also before #4 fixes this. Either it really fixes the problem or changes the timing and hides the real problem.
Since the fix exists and it is hard to reproduce it is not critical. I will test again once in a while.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #21 from Zebediah Figura z.figura12@gmail.com --- Can you by any chance reproduce this with +winsock,+server?
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #22 from Roman Pišl rpisl@seznam.cz --- (In reply to Zebediah Figura from comment #21)
Can you by any chance reproduce this with +winsock,+server?
Ok, I'll perform future experiments with these options and see what it catches. Unfortunately +server is a big change to the timing (that is problem for Wine, the application itself is single-threaded).
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #23 from Roman Pišl rpisl@seznam.cz --- Created attachment 71520 --> https://bugs.winehq.org/attachment.cgi?id=71520 WINEDEBUG=+winsock,+server
Trace log with +winsock,+server
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #24 from Zebediah Figura z.figura12@gmail.com --- I suspect, but haven't confirmed, that we're racing between sock_error() and poll(). It looks like connection fails, signaling AFD_POLL_WRITE (is this correct?) while also returning STATUS_CONNECTION_REFUSED. select() throws that status away because it doesn't care, though, and after the request completes the program checks for SO_ERROR, but the error was already swallowed.
I'm not sure it's correct that we're signaling AFD_POLL_WRITE, and even if it is we need to check whether select() should be signalling the writefd and returning success here.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #25 from Zebediah Figura z.figura12@gmail.com --- Created attachment 71537 --> https://bugs.winehq.org/attachment.cgi?id=71537 avoid reporting POLLOUT on connection failure
Does the attached patch help? It's not a complete solution, but it should hopefully fix the issue for now.
https://bugs.winehq.org/show_bug.cgi?id=51442
--- Comment #26 from Roman Pišl rpisl@seznam.cz ---
Does the attached patch help? It's not a complete solution, but it should hopefully fix the issue for now.
Yes, it does help!
https://bugs.winehq.org/show_bug.cgi?id=51442
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Fixed by SHA1| |51e5995d47b7de9a2d0d6a40f7e | |b3e3c11b83cf2 Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
--- Comment #27 from Zebediah Figura z.figura12@gmail.com --- Fixed by https://source.winehq.org/git/wine.git/commitdiff/51e5995d47b7de9a2d0d6a40f7eb3e3c11b83cf2.
https://bugs.winehq.org/show_bug.cgi?id=51442
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #28 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 7.0-rc6.