https://bugs.winehq.org/show_bug.cgi?id=42377
Bug ID: 42377 Summary: WSASend() returns invalid number of bytes when both lpOverlapped and lpNumberOfBytesSent Product: Wine Version: 2.0-rc5 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: -unknown Assignee: wine-bugs@winehq.org Reporter: zbr@ioremap.net Distribution: ---
Created attachment 57161 --> https://bugs.winehq.org/attachment.cgi?id=57161 WINEDEBUG=+all ./grtest_server
Golang uses overlapped sockets, but sets both lpOverlapped and lpNumberOfBytesSent. Sometimes lpNumberOfBytesSent is returned to be higher than number of bytes in the buffer to be sent, this breaks overlapped state machine in golang, which eventually ends up with error and connection reset.
Bug is 100% reproducible, but it requires quite a setup. First, golang uses memory mapped page hack to get time structures. These are not updated by wine (see the bug here: https://bugs.winehq.org/show_bug.cgi?id=29168). To fix it I made a patch for golang which falls back to QPC timings if startup code detects timer is stalled, see patches and discussion: https://go-review.googlesource.com/#/c/35710/ https://github.com/golang/go/issues/18537
After golang has been patched, I use this test case to compile a grpc server and python client: https://github.com/bioothod/grpc_test
# edit Makefile to specify golang path and root directory $ make windows $ wine ./grpc_server $ python client.py ...
Running multiple clients 100% ends up with connection close errors, which I debugged to incorrect (larger than the buffer size) lpNumberOfBytesSent
Here is a discussion: https://github.com/golang/go/issues/18887
Turning overlapped feature off (by setting lpOverlapped to NULL) resolves the bug, but as far as I understand it ends up with blocked WSASend() call, who will wait for buffer to be completely sent, which is kind of undesired behaviour.
Attached WINEDEBUG=+all ./grtest_server log if it helps.
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #1 from Evgeniy Polyakov zbr@ioremap.net --- FYI I've just committed binary server into github page, it is statically linked so you should run it in test environment without need to install and patch golang
When you see something like WSASend: fd: 148, buf: {Len:13 Buf:0x5b524000}, qty: 37, len: 13, o: {Internal:0 InternalHigh:13 Offset:0 OffsetHigh:0 HEvent:0}, e: <nil>, r1: 0, e1: 0
in the logs, it means that 13-bytes buffer has been sent to WSASend(), but it returned 37 bytes in lpNumberOfBytesSent
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #2 from Bruno Jesus 00cpxxx@gmail.com --- Please attach a +winsock log.
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #3 from Evgeniy Polyakov zbr@ioremap.net --- Created attachment 57162 --> https://bugs.winehq.org/attachment.cgi?id=57162 WINEDEBUG=+winsock ./grtest_server
This is the faulty line
WSASend: fd: 252, buf: {Len:24 Buf:0x5b62c000}, qty: 36, len: 24, o: {Internal:0 InternalHigh:24 Offset:0 OffsetHigh:0 HEvent:0}, e: <nil>, r1: 0, e1: 0
There is a number of times when WSASend() returned less bytes than the buffer size, but it should be ok, right?
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #4 from Bruno Jesus 00cpxxx@gmail.com --- Thanks for the log, I'm trying to test it myself.
I'm not saying there is no problem at Wine but even MSDN discourages the mixed use of overlapped and bytes sent pointer together.
https://msdn.microsoft.com/en-us/library/windows/desktop/ms742203(v=vs.85).a...
... Use NULL for this parameter if the lpOverlapped parameter is not NULL to avoid potentially erroneous results. This parameter can be NULL only if the lpOverlapped parameter is not NULL. ...
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #5 from Evgeniy Polyakov zbr@ioremap.net --- Yup, I saw that warning, but it really breaks the things and from higher level point of view looks like a bug which must be fixed somewhere.
If wine developers confirm this is not a wine bug, but feature (which might be because of msdn memo, although it never shows up in windows in my tests), I will continue investigating how it can be fixed in golang, but so far I'm curious whether it can be fixed in wine or confirmed not to be.
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #6 from Bruno Jesus 00cpxxx@gmail.com --- I don't have gprc and installing doesn't look simple as it has other dependencies for python, is there any other way to test?
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #7 from Evgeniy Polyakov zbr@ioremap.net --- I've just added a client binary statically compiled for x86_64 linux into github repository https://github.com/bioothod/grpc_test
Server behaves a bit 'differently' than python client, probably because of 10-fold performance difference, but with go client it only shows that returned number of bytes is smaller than the buffer size, this nevertheless leads to protocol errors which I force to panic on, hope it will help you.
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #8 from Bruno Jesus 00cpxxx@gmail.com --- I'm not sure on how to see the error. I'm running 5 clients against the server and nothing bad seems to happen.
So I installed pip then installed grpc but the python still requires grtest_pb2 which pip can't find. I tried git cloning grpc and compiled it but the client still complains about the same library.
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #9 from Bruno Jesus 00cpxxx@gmail.com --- I decided to test Wine-Staging instead, like you are doing. And then I can reproduce the error.
WSASend: fd: 164, buf: {Len:33 Buf:0x5b4f4000}, qty: 23, len: 33, o: {Internal:0 InternalHigh:33 Offset:0 OffsetHigh:0 HEvent:0}, e: <nil>, r1: 0, e1: 0 WSASend1: ExecIO returns1: qty: 23, origlen: 33, buflen: 33 Write: ExecIO: n: 23, len: 33, error: <nil>
So far I could not reproduce with vanilla Wine yet.
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #10 from Bruno Jesus 00cpxxx@gmail.com --- With great help from Sebastian Lackner we got a bit further. Wine is not writing the wrong number of bytes. The problem (so far) is that the Go app is receiving the WSASend notification of the previous call on the current call.
Example (with extra debug added):
====Previous WSASend call==== WS2_sendto socket 00e8, wsabuf 0x5b4a056c, nbufs 1, flags 0, to (nil), tolen 0, ovl 0x5b4a0540, func (nil) WS2_sendto fd=54, options=0 WS2_sendto WS_AddCompletion 0xe8, cvalue 5b4a0540, sent = 32 WS2_sendto -> 32 bytes (no overlap necessary)
====Current WSASend call==== WS2_sendto socket 00e8, wsabuf 0x5b4a056c, nbufs 1, flags 0, to (nil), tolen 0, ovl 0x5b4a0540, func (nil) WS2_sendto fd=54, options=0 WS2_sendto WS_AddCompletion 0xe8, cvalue 5b4a0540, sent = 33 WS2_sendto -> 33 bytes (no overlap necessary)
====Go error==== WSASend: fd: 232, buf: {Len:33 Buf:0x5b4fc000}, qty: 32, len: 33, o: {Internal:0 InternalHigh:33 Offset:0 OffsetHigh:0 HEvent:0}, e: <nil>, r1: 0, e1: 0 WSASend1: ExecIO returns1: qty: 32, origlen: 33, buflen: 33
Note that qty = 32 (previous call) while expected sent = 33 (current call)
The reasons are not known yet. lpNumberOfBytesSent is correct, but not used by Go which always wait for the completion notification.
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #11 from Bruno Jesus 00cpxxx@gmail.com --- Go or the app is using one thread A to send the data (WSASend) and another thread B to check if operation was completed (GetQueuedCompletionStatus). Something is getting out of sync in the process.
What happens is as follows:
A WSASend(10 bytes) start A WSASend end B GetQueuedCompletionStatus = 10 bytes OK
A WSASend(11 bytes) start A WSASend end B GetQueuedCompletionStatus = 11 bytes OK
A WSASend(12 bytes) start A WSASend end B GetQueuedCompletionStatus = WAIT_TIMEOUT (error 258)
A WSASend(13 bytes) start B GetQueuedCompletionStatus = 12 bytes ?? A WSASend end
Go or the app is not calling GetQueuedCompletionStatus again for the timed out operation. So in the next WSASend it receives prematurely the results of the previous call.
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #12 from Evgeniy Polyakov zbr@ioremap.net --- Yup, that sounds like the cause, it is consistent with what we see.
Another hint is that WSASend() sometimes immediately returns completed operation, i.e. without queuing anything, and probably sometimes in this case it returns number of bytes written in the previous operation - in this case there is no GetQueuedCompletionStatus() at least in the error trace, but maybe was wrecked previously.
https://bugs.winehq.org/show_bug.cgi?id=42377
Sebastian Lackner sebastian@fds-team.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |erich.e.hoover@wine-staging | |.com, michael@fds-team.de, | |sebastian@fds-team.de Product|Wine |Wine-staging Component|-unknown |-unknown
--- Comment #13 from Sebastian Lackner sebastian@fds-team.de --- I believe this issue is wine-staging specific. More details about my investigations and a test patch will follow soon. Assigning to me to avoid duplicate work by other people. ;)
https://bugs.winehq.org/show_bug.cgi?id=42377
Sebastian Lackner sebastian@fds-team.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|wine-bugs@winehq.org |sebastian@fds-team.de
https://bugs.winehq.org/show_bug.cgi?id=42377
Sebastian Lackner sebastian@fds-team.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|sebastian@fds-team.de |wine-bugs@winehq.org
--- Comment #14 from Sebastian Lackner sebastian@fds-team.de --- (In reply to Sebastian Lackner from comment #13)
I believe this issue is wine-staging specific. More details about my investigations and a test patch will follow soon. Assigning to me to avoid duplicate work by other people. ;)
This should be fixed in Wine Staging 2.1, please retest.
Actually, in contrast to the bug description, the problem was not directly related to WSASend, but to SetFileCompletionNotificationModes. Previously, Wine Staging did not keep the settings when a client was accepted (with AcceptEx for example). The reason why the variables no longer matched is that the application got confused by completion notifications which were not expected. Plain Wine was not affected by the problem because the API is not supported yet.
The patch has been updated and a regression test has been added: https://github.com/wine-compholio/wine-staging/blob/master/patches/kernel32-...
Thanks again for the bug report!
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #15 from Evgeniy Polyakov zbr@ioremap.net --- I've just checked wine git (as of commit 06c5a9ab55751b1c66ea0847aea4d4a45d8d343c) and I can confirm that it works with my test.
But I do not see your patches in the tree, is that you dropped something from staging and it started working and eventually you will return it with your patches will come in or is it just a luck?
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #16 from Sebastian Lackner sebastian@fds-team.de --- (In reply to Evgeniy Polyakov from comment #15)
I've just checked wine git (as of commit 06c5a9ab55751b1c66ea0847aea4d4a45d8d343c) and I can confirm that it works with my test.
The problem was only present in Staging versions. You would have to compile Wine Staging 2.1 (or newer) in order to verify that this issue is fixed. Compiling Wine without Staging patches is not sufficient because it was never affected.
But I do not see your patches in the tree, is that you dropped something from staging and it started working and eventually you will return it with your patches will come in or is it just a luck?
No, nothing was dropped. A bug in an existing patchset was fixed, and a testcase was added to show that this change is correct. Thanks to your bug report this was fixed before the patches got added to the development branch - which is exactly the purpose of the Staging tree. Obviously, when the patchset is later sent upstream, we will use the version including the fix, so the bug will not return.
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #17 from Evgeniy Polyakov zbr@ioremap.net --- Argh, I see.
Ok, I'm trying to setup staging patches, here is the error:
wine-staging$ ./patches/patchinstall.sh DESTDIR="/home/zbr/awork/wine" --all ... Applying /home/zbr/awork/wine-staging/patches/kernel32-BeingDebugged/0001-kernel32-Replace-Peb-BeingDebugged-check-by-CheckRem.patch error: patch failed: dlls/kernel32/process.c:1087 error: dlls/kernel32/process.c: patch does not apply ERROR: Failed to apply patch, aborting!
I can not disable this patchset, since Staging (as well as something else) depends on it.
wine tree: 06c5a9ab55751b1c66ea0847aea4d4a45d8d343c wine-staging tree: 29d851bf3af5581d1048bbd2af9b5163e54a4352
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #18 from Sebastian Lackner sebastian@fds-team.de --- (In reply to Evgeniy Polyakov from comment #17)
wine tree: 06c5a9ab55751b1c66ea0847aea4d4a45d8d343c wine-staging tree: 29d851bf3af5581d1048bbd2af9b5163e54a4352
Please check "./patches/patchinstall.sh --upstream-commit" to find out which corresponding Wine commit should be used (or, even easier, just use your distribution build scripts / WineHQ download to obtain the new version). In this case some changes got upstream which require rebasing the Staging patches.
https://bugs.winehq.org/show_bug.cgi?id=42377
--- Comment #19 from Evgeniy Polyakov zbr@ioremap.net --- I've checked out staging ID reported by ./patches/patchinstall.sh --upstream-commit, applied all patches, recompiled wine and can confirm, that bug is gone.
Thank you.
https://bugs.winehq.org/show_bug.cgi?id=42377
Evgeniy Polyakov zbr@ioremap.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED
--- Comment #20 from Evgeniy Polyakov zbr@ioremap.net --- I've checked out staging ID reported by ./patches/patchinstall.sh --upstream-commit, applied all patches, recompiled wine and can confirm, that bug is gone.
Thank you.
https://bugs.winehq.org/show_bug.cgi?id=42377
Alistair Leslie-Hughes leslie_alistair@hotmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #21 from Alistair Leslie-Hughes leslie_alistair@hotmail.com --- Closing Fixed Staging 3.14