https://bugs.winehq.org/show_bug.cgi?id=38399
Bug ID: 38399 Summary: Voobly fails to connect to lobby (regression) Product: Wine Version: 1.7.40 Hardware: x86 URL: http://www.voobly.com/ OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: winsock Assignee: wine-bugs@winehq.org Reporter: michael@fds-team.de CC: 00cpxxx@gmail.com Regression SHA1: b2556a2c34f36c0c82f028dbf24acb1ed77cce29 Distribution: ---
Created attachment 51252 --> https://bugs.winehq.org/attachment.cgi?id=51252 +winsock log
Voobly fails to connect to a lobby and instead prints the following error message with 1.7.40 (while it works with 1.7.39):
"Network Connection Error [Code: 10000005]. Connection errors are typically caused by a temporary server outage, a firewall, or a problem with your internet connection. If it's been a long time since you've connected to Voobly it's recommended that you redownload the client from the website."
In order to reproduce the issue: start Voobly, login into your account and double click on any lobby (for example "Age of Empires -> Babylon"). The game itself is not necessary to trigger this problem.
A bisect results in the following patch:
------ commit b2556a2c34f36c0c82f028dbf24acb1ed77cce29 Author: Bruno Jesus 00cpxxx@gmail.com Date: Sun Mar 29 00:53:43 2015 -0300
ws2_32: Ensure sockets in exceptfds get set when an error occurs. ------
The bug is still present in the current git version and is therefore not a duplicate of bug 38372.
https://bugs.winehq.org/show_bug.cgi?id=38399
Michael Müller michael@fds-team.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |download, regression CC| |michael@fds-team.de
https://bugs.winehq.org/show_bug.cgi?id=38399
--- Comment #1 from Bruno Jesus 00cpxxx@gmail.com --- Thanks for the report, I expected that the select changes would bring regressions, this is the reason I didn't close bug 9425 yet.
https://bugs.winehq.org/show_bug.cgi?id=38399
Bruno Jesus 00cpxxx@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Assignee|wine-bugs@winehq.org |00cpxxx@gmail.com Summary|Voobly fails to connect to |Voobly fails to connect to |lobby (regression) |lobby (threaded app close | |socket in the middle of | |other thread select call) Ever confirmed|0 |1
--- Comment #2 from Bruno Jesus 00cpxxx@gmail.com --- Ok, confirming. Will send a patch shortly.
https://bugs.winehq.org/show_bug.cgi?id=38399
--- Comment #3 from Bruno Jesus 00cpxxx@gmail.com --- Created attachment 51263 --> https://bugs.winehq.org/attachment.cgi?id=51263 patch
The attached patch solves the issue, but I'm looking for cheaper ways. The problem is that all descriptors are duplicated (dup call) for reasons I don't understand so when we call select with socket A we retrieve handle X from the server and do the poll, when we close the socket in a second thread it looks like its duplicated copies are still alive because if I try a getsockopt on the handle X from inside the select code it still works when I really expected it to give an error.
The patch ensures the socket really died asking for the server to fetch me its handle. If anyone understands why the server always duplicate the handle I would appreciate the explanation.
https://source.winehq.org/source/dlls/ntdll/server.c#0990
https://bugs.winehq.org/show_bug.cgi?id=38399
Sebastian Lackner sebastian@fds-team.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |sebastian@fds-team.de
--- Comment #4 from Sebastian Lackner sebastian@fds-team.de --- (In reply to Bruno Jesus from comment #3)
Created attachment 51263 [details] patch
The patch is not completely correct, you are passing a wrong file descriptor to the first release_sock_fd() call (should be "fd" instead of "fds[j].fd"). Moreover I think it would be better to add a check to avoid calling release_sock_fd() with -1 (the rest of the code also checks this).
The attached patch solves the issue, but I'm looking for cheaper ways. The problem is that all descriptors are duplicated (dup call) for reasons I don't understand so when we call select with socket A we retrieve handle X from the server and do the poll, when we close the socket in a second thread it looks like its duplicated copies are still alive because if I try a getsockopt on the handle X from inside the select code it still works when I really expected it to give an error.
You already wrote the correct reason into the comment of your patch. The idea is to make sure that handles are never closed while they are still being used by a different thread. Of course it would also be possible to use refcounting for that, but I assume dup(...) was used just because its easier. ;)
When I understand the problem correctly, then using a complicated approach with get_sock_fd() and release_sock_fd() is the only way to do it properly with the current design. If ws2_32 code gets moved to ntdll at some point in the future then there might be ways to do it easier.
https://bugs.winehq.org/show_bug.cgi?id=38399
--- Comment #5 from Bruno Jesus 00cpxxx@gmail.com --- (In reply to Sebastian Lackner from comment #4)
The patch is not completely correct, you are passing a wrong file descriptor to the first release_sock_fd() call (should be "fd" instead of "fds[j].fd"). Moreover I think it would be better to add a check to avoid calling release_sock_fd() with -1 (the rest of the code also checks this).
You are right, classic copy & paste issues that happen when you are still typing code after 1am ;-)
https://bugs.winehq.org/show_bug.cgi?id=38399
--- Comment #6 from Bruno Jesus 00cpxxx@gmail.com --- (In reply to Sebastian Lackner from comment #4)
When I understand the problem correctly, then using a complicated approach with get_sock_fd() and release_sock_fd() is the only way to do it properly with the current design. If ws2_32 code gets moved to ntdll at some point in the future then there might be ways to do it easier.
To use the patch approach I believe your lock free cache patch would be good, to reduce performance impact. I have to test this with much larger select sets to check.
The main problem is how to tell if POLLHUP was a misc socket problem, socket closed in other thread or socket didn't connect to server. Windows select call reacts differently to these 3 situations and they are all covered in the tests now. If there was a wine_server_handle_exists( SOCKET2HANDLE(s) ) it could at least eliminate the need for the dup call.
https://bugs.winehq.org/show_bug.cgi?id=38399
--- Comment #7 from Sebastian Lackner sebastian@fds-team.de --- (In reply to Bruno Jesus from comment #6)
(In reply to Sebastian Lackner from comment #4)
When I understand the problem correctly, then using a complicated approach with get_sock_fd() and release_sock_fd() is the only way to do it properly with the current design. If ws2_32 code gets moved to ntdll at some point in the future then there might be ways to do it easier.
To use the patch approach I believe your lock free cache patch would be good, to reduce performance impact. I have to test this with much larger select sets to check.
Yes, the patch would definitely help. It will still have the overhead of dup() and close(), but besides that the lookup in the fd cache should be very fast. I am only hesitating a bit to remove all the #ifdefs in my patch, because it will break PowerPC support or make the code significantly larger. :/
The main problem is how to tell if POLLHUP was a misc socket problem, socket closed in other thread or socket didn't connect to server. Windows select call reacts differently to these 3 situations and they are all covered in the tests now. If there was a wine_server_handle_exists( SOCKET2HANDLE(s) ) it could at least eliminate the need for the dup call.
Yes, either something like a wine_server_handle_exists function or alternatively a function to query for the internal ntdll unix file descriptor. In some situations it should also be fine to use the "unsafe" file descriptor, especially because there is no good protection anyway. The dup() is done outside of locking ( http://source.winehq.org/git/wine.git/blob/HEAD:/dlls/ntdll/server.c#l997 ) and might lead to race-conditions when a different thread closes and creates a new file descriptor.
https://bugs.winehq.org/show_bug.cgi?id=38399
--- Comment #8 from Sebastian Lackner sebastian@fds-team.de --- Should be fixed with http://source.winehq.org/git/wine.git/commit/0d2817b161e7c363c1fab523b664500..., please retest.
https://bugs.winehq.org/show_bug.cgi?id=38399
Bruno Jesus 00cpxxx@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Fixed by SHA1| |0d2817b161e7c363c1fab523b66 | |45002e980597a Status|ASSIGNED |RESOLVED Resolution|--- |FIXED Assignee|00cpxxx@gmail.com |wine-bugs@winehq.org
--- Comment #9 from Bruno Jesus 00cpxxx@gmail.com --- (In reply to Sebastian Lackner from comment #8)
Should be fixed with http://source.winehq.org/git/wine.git/commit/ 0d2817b161e7c363c1fab523b6645002e980597a, please retest.
It certainly is because the fixed test is exactly the same problem from the application. Thanks for taking care of this.
https://bugs.winehq.org/show_bug.cgi?id=38399
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #10 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 1.7.43.
https://bugs.winehq.org/show_bug.cgi?id=38399
Anastasius Focht focht@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- URL|http://www.voobly.com/ |https://web.archive.org/web | |/20160426025352/http://www. | |voobly.com/updates/voobly-v | |2.2.4.32.exe CC| |focht@gmx.net
--- Comment #11 from Anastasius Focht focht@gmx.net --- Hello folks,
adding stable download link via Internet Archive for documentation.
Client version deduced from Michael's posting date: 2015-04-12
https://web.archive.org/web/20150408033335/http://www.voobly.com/
Version: 2.2.4.30
Client v2.2.4.30 was not captured so next one v2.2.4.32 is taken (few weeks away):
--- snip --- ... com,voobly)/updates/voobly-v2.2.4.19.exe 20150315030924 http://www.voobly.com/updates/voobly-v2.2.4.19.exe application/x-msdos-program 200 YIN2CQRNQOXMPYDANXT4Q6WXQXYPN66I 10423887 com,voobly)/updates/voobly-v2.2.4.20.exe 20150427081802 http://www.voobly.com/updates/voobly-v2.2.4.20.exe application/x-msdos-program 200 VXOSRUVJGTRVRTPX5ZC3YRN3F4OTIOUL 10423878 com,voobly)/updates/voobly-v2.2.4.32.exe 20160426025352 http://www.voobly.com/updates/voobly-v2.2.4.32.exe application/x-msdos-program 200 7JEV5UWRB55PMQTY7LCL3DBH774ESDYF 10426937 com,voobly)/updates/voobly-v2.2.4.35.exe 20150626063650 http://www.voobly.com/updates/voobly-v2.2.4.35.exe application/x-msdos-program 200 CUBARE7NY5ES3ZUZVR6UWIDW2NJA2WSJ 10425502 ... --- snip ---
https://web.archive.org/web/20160426025352/http://www.voobly.com/updates/voo...
https://www.virustotal.com/gui/file/367733ae0e508590725f7e5e82536a69bcfa0f72...
$ sha1sum voobly-v2.2.4.32.exe fa495ed2d10f7af64278fac4bd8c27fff8490f05 voobly-v2.2.4.32.exe
$ du -sh voobly-v2.2.4.32.exe 11M voobly-v2.2.4.32.exe
Regards