Based on a finding here: https://github.com/ValveSoftware/Proton/issues/6709#issuecomment-1626059391
On Windows, small (or 0) receive buffer works a bit differently when async IO is concerned. It is supposed to bypass some system buffering and receive the data directly to the user buffer if async receive is pending. So the apps may be using 0 buffer size as a small optimization to skip extra buffer copies in the system. On Unix, if the buffer is too small to receive the network packet the packet will just be dropped. As I interpret my tests though, Windows doesn't actually drop the packets with async socket even if there is no receive currently pending (looking like setting smaller buffer have no effect at all).
This behaves differently with synchronous IO, Windows can also drop packets which don't fit the smaller set buffer in this case. I think that we can just never set receive buffers smaller than the default one on Unix sockets. The only effect of actually setting that which I can think of is that the system starts dropping packets more eagerly. Which is wrong for async socket I/O case. That might be not so wrong for sync socket I/O case, but implementing that specific to I/O type will add a lot of complication and will only lead that we will dropping packets more eagerly. While we are probably still won't be doing it exactly like on Windows as it depends on multiple factors in Unix and Windows network stack.
From: Paul Gofman pgofman@codeweavers.com
--- dlls/ws2_32/tests/sock.c | 59 +++++++++++++++++++++++++++++++++++++++- server/sock.c | 10 +++++-- 2 files changed, 65 insertions(+), 4 deletions(-)
diff --git a/dlls/ws2_32/tests/sock.c b/dlls/ws2_32/tests/sock.c index e53d9991e56..4fe50838c64 100644 --- a/dlls/ws2_32/tests/sock.c +++ b/dlls/ws2_32/tests/sock.c @@ -161,6 +161,7 @@ static GUID WSARecvMsg_GUID = WSAID_WSARECVMSG;
static SOCKET setup_server_socket(struct sockaddr_in *addr, int *len); static SOCKET setup_connector_socket(const struct sockaddr_in *addr, int len, BOOL nonblock); +static int sync_recv(SOCKET s, void *buffer, int len, DWORD flags);
static void tcp_socketpair_flags(SOCKET *src, SOCKET *dst, DWORD flags) { @@ -1225,7 +1226,7 @@ static void test_set_getsockopt(void) {AF_INET6, SOCK_DGRAM, IPPROTO_IPV6, IPV6_UNICAST_HOPS, TRUE, {1, 1, 4}, {0}, FALSE}, {AF_INET6, SOCK_DGRAM, IPPROTO_IPV6, IPV6_V6ONLY, TRUE, {1, 1, 1}, {0}, TRUE}, }; - SOCKET s, s2; + SOCKET s, s2, src, dst; int i, j, err, lasterr; int timeout; LINGER lingval; @@ -1237,6 +1238,7 @@ static void test_set_getsockopt(void) int expected_err, expected_size; DWORD value, save_value; UINT64 value64; + char buffer[4096];
struct _prottest { @@ -1302,6 +1304,61 @@ static void test_set_getsockopt(void) ok( !err, "getsockopt(SO_RCVBUF) failed error: %u\n", WSAGetLastError() ); ok( value == 4096, "expected 4096, got %lu\n", value );
+ value = 0; + size = sizeof(value); + err = setsockopt(s, SOL_SOCKET, SO_RCVBUF, (char *)&value, size); + ok( !err, "setsockopt(SO_RCVBUF) failed error: %u\n", WSAGetLastError() ); + value = 0xdeadbeef; + err = getsockopt(s, SOL_SOCKET, SO_RCVBUF, (char *)&value, &size); + ok( !err, "getsockopt(SO_RCVBUF) failed error: %u\n", WSAGetLastError() ); + ok( value == 0, "expected 0, got %lu\n", value ); + + /* Test non-blocking receive with too short SO_RCVBUF. */ + tcp_socketpair(&src, &dst); + set_blocking(src, FALSE); + set_blocking(dst, FALSE); + + value = 0; + size = sizeof(value); + err = setsockopt(src, SOL_SOCKET, SO_SNDBUF, (char *)&value, size); + ok( !err, "got %d, error %u.\n", err, WSAGetLastError() ); + + value = 0xdeadbeef; + err = getsockopt(dst, SOL_SOCKET, SO_RCVBUF, (char *)&value, &size); + ok( !err, "got %d, error %u.\n", err, WSAGetLastError() ); + if (value >= sizeof(buffer) * 3) + { + value = 1024; + size = sizeof(value); + err = setsockopt(dst, SOL_SOCKET, SO_RCVBUF, (char *)&value, size); + ok( !err, "got %d, error %u.\n", err, WSAGetLastError() ); + value = 0xdeadbeef; + err = getsockopt(dst, SOL_SOCKET, SO_RCVBUF, (char *)&value, &size); + ok( !err, "got %d, error %u.\n", err, WSAGetLastError() ); + ok( value == 1024, "expected 0, got %lu\n", value ); + + err = send(src, buffer, sizeof(buffer), 0); + ok(err == sizeof(buffer), "got %d\n", err); + err = send(src, buffer, sizeof(buffer), 0); + ok(err == sizeof(buffer), "got %d\n", err); + err = send(src, buffer, sizeof(buffer), 0); + ok(err == sizeof(buffer), "got %d\n", err); + + err = sync_recv(dst, buffer, sizeof(buffer), 0); + ok(err == sizeof(buffer), "got %d, error %u\n", err, WSAGetLastError()); + err = sync_recv(dst, buffer, sizeof(buffer), 0); + ok(err == sizeof(buffer), "got %d, error %u\n", err, WSAGetLastError()); + err = sync_recv(dst, buffer, sizeof(buffer), 0); + ok(err == sizeof(buffer), "got %d, error %u\n", err, WSAGetLastError()); + } + else + { + skip("Default SO_RCVBUF %lu is too small, skipping test.\n", value); + } + + closesocket(src); + closesocket(dst); + /* SO_LINGER */ for( i = 0; i < ARRAY_SIZE(linger_testvals);i++) { size = sizeof(lingval); diff --git a/server/sock.c b/server/sock.c index b63412ab216..abf267f13f6 100644 --- a/server/sock.c +++ b/server/sock.c @@ -240,6 +240,7 @@ struct sock struct poll_req *main_poll; /* main poll */ union win_sockaddr addr; /* socket name */ int addr_len; /* socket name length */ + unsigned int default_rcvbuf; /* initial advisory recv buffer size */ unsigned int rcvbuf; /* advisory recv buffer size */ unsigned int sndbuf; /* advisory send buffer size */ unsigned int rcvtimeo; /* receive timeout in ms */ @@ -1733,6 +1734,7 @@ static struct sock *create_socket(void) sock->reset = 0; sock->reuseaddr = 0; sock->exclusiveaddruse = 0; + sock->default_rcvbuf = 0; sock->rcvbuf = 0; sock->sndbuf = 0; sock->rcvtimeo = 0; @@ -1916,7 +1918,7 @@ static int init_socket( struct sock *sock, int family, int type, int protocol )
len = sizeof(value); if (!getsockopt( sockfd, SOL_SOCKET, SO_RCVBUF, &value, &len )) - sock->rcvbuf = value; + sock->rcvbuf = sock->default_rcvbuf = value;
len = sizeof(value); if (!getsockopt( sockfd, SOL_SOCKET, SO_SNDBUF, &value, &len )) @@ -2011,6 +2013,7 @@ static struct sock *accept_socket( struct sock *sock ) acceptsock->reuseaddr = sock->reuseaddr; acceptsock->exclusiveaddruse = sock->exclusiveaddruse; acceptsock->sndbuf = sock->sndbuf; + acceptsock->default_rcvbuf = sock->default_rcvbuf; acceptsock->rcvbuf = sock->rcvbuf; acceptsock->sndtimeo = sock->sndtimeo; acceptsock->rcvtimeo = sock->rcvtimeo; @@ -3064,7 +3067,7 @@ static void sock_ioctl( struct fd *fd, ioctl_code_t code, struct async *async )
case IOCTL_AFD_WINE_SET_SO_RCVBUF: { - DWORD rcvbuf; + DWORD rcvbuf, set_rcvbuf;
if (get_req_data_size() < sizeof(rcvbuf)) { @@ -3072,8 +3075,9 @@ static void sock_ioctl( struct fd *fd, ioctl_code_t code, struct async *async ) return; } rcvbuf = *(DWORD *)get_req_data(); + set_rcvbuf = max( rcvbuf, sock->default_rcvbuf );
- if (!setsockopt( unix_fd, SOL_SOCKET, SO_RCVBUF, (char *)&rcvbuf, sizeof(rcvbuf) )) + if (!setsockopt( unix_fd, SOL_SOCKET, SO_RCVBUF, (char *)&set_rcvbuf, sizeof(set_rcvbuf) )) sock->rcvbuf = rcvbuf; else set_error( sock_get_ntstatus( errno ) );
Zebediah Figura (@zfigura) commented about dlls/ws2_32/tests/sock.c:
- ok( value == 0, "expected 0, got %lu\n", value );
- /* Test non-blocking receive with too short SO_RCVBUF. */
- tcp_socketpair(&src, &dst);
- set_blocking(src, FALSE);
- set_blocking(dst, FALSE);
- value = 0;
- size = sizeof(value);
- err = setsockopt(src, SOL_SOCKET, SO_SNDBUF, (char *)&value, size);
- ok( !err, "got %d, error %u.\n", err, WSAGetLastError() );
- value = 0xdeadbeef;
- err = getsockopt(dst, SOL_SOCKET, SO_RCVBUF, (char *)&value, &size);
- ok( !err, "got %d, error %u.\n", err, WSAGetLastError() );
- if (value >= sizeof(buffer) * 3)
When can this happen?
On Fri Sep 1 20:47:08 2023 +0000, Zebediah Figura wrote:
When can this happen?
Happens on Testbot machines before Win8. Here is the run of my patch without this check: https://testbot.winehq.org/JobDetails.pl?Key=136940#k207 . This also notably (and maybe expectedly) leads to missing the last 4096 bytes (so it is like Windows doesn't effectively use smaller buffers with async I/O). Also, it can happen on Linux if one configure smaller default buffer size in /proc/sys/net/core/rmem_default
On Fri Sep 1 21:00:19 2023 +0000, Paul Gofman wrote:
Happens on Testbot machines before Win8. Here is the run of my patch without this check: https://testbot.winehq.org/JobDetails.pl?Key=136940#k207 . This also notably (and maybe expectedly) leads to missing the last 4096 bytes (so it is like Windows doesn't effectively use smaller buffers with async I/O). Also, it can happen on Linux if one configure smaller default buffer size in /proc/sys/net/core/rmem_default
Oh, I actually misread this test... now that I can properly tell which calls are SNDBUF and which are RCVBUF, I'm confused. Partly as to why we're bothering to test the default value of RCVBUF in the first place, but also: if the point is to test that small values of RCVBUF don't result in dropping packets, don't those test results contradict that assertion? Or are the applications depending on this simply broken on Windows < 8?
One additional thing that bothers me about this, now that I look—one thing that *also* changed between 7 and 8 is that TCP loopback got some sort of fast path that results in its packets always being synchronous. I wonder if "packets are never dropped despite RCVBUF" is actually just a consequence of loopback, and is not actually true if packets come over the network.
On Fri Sep 1 21:24:40 2023 +0000, Zebediah Figura wrote:
Oh, I actually misread this test... now that I can properly tell which calls are SNDBUF and which are RCVBUF, I'm confused. Partly as to why we're bothering to test the default value of RCVBUF in the first place, but also: if the point is to test that small values of RCVBUF don't result in dropping packets, don't those test results contradict that assertion? Or are the applications depending on this simply broken on Windows < 8? One additional thing that bothers me about this, now that I look—one thing that *also* changed between 7 and 8 is that TCP loopback got some sort of fast path that results in its packets always being synchronous. I wonder if "packets are never dropped despite RCVBUF" is actually just a consequence of loopback, and is not actually true if packets come over the network.
I am not bothering to test default value, but having it big enough for 3 buffers is a prerequisite for the test to be able to test what it is testing. Of course I could use a smaller buffer probably, but I don't see what that will change as probably (in the view of the default buffer size is not specified anywhere and may be configured on Linux, even though such small default values don't make sense and are likely to break things for apps anyway) it should still be checked?
Yes, the point is that small values of RCVBUF do not result in dropping packets (*at least* until they fit into the default buffer size). I am afraid I am not quite following, which is the assertion and how that contradicts? I am not testing default buffer, I am testing what happens when a small buffer size is set (while to make the test work predictably default buffer should be big enough). Application(s) under concern are those which set buffer size to some small value (or 0 like in the present case). They still won't get broken before Win8 in this aspect as soon as they call async receive (because network packets will fit in 8192 buffer; while Linux is taking this small buffer size more seriously and starts dropping packets).
I am attaching the (very ad-hoc) manual test on top of this patch which I used to testing it over real network. To use it, hardcoded server_ip in the top of tcp_socketpair_flags should be set, then 'ws2_32_test.exe sock' run on the server side and after that 'ws2_32_test.exe sock send' on the client side. I think the test show that over real network nothing is lost as well.
[test.patch](/uploads/103764283ca0246cbfae3efaf5d5912c/test.patch)
On Fri Sep 1 22:16:03 2023 +0000, Paul Gofman wrote:
I am not bothering to test default value, but having it big enough for 3 buffers is a prerequisite for the test to be able to test what it is testing. Of course I could use a smaller buffer probably, but I don't see what that will change as probably (in the view of the default buffer size is not specified anywhere and may be configured on Linux, even though such small default values don't make sense and are likely to break things for apps anyway) it should still be checked? Yes, the point is that small values of RCVBUF do not result in dropping packets (*at least* until they fit into the default buffer size). I am afraid I am not quite following, which is the assertion and how that contradicts? I am not testing default buffer, I am testing what happens when a small buffer size is set (while to make the test work predictably default buffer should be big enough). Application(s) under concern are those which set buffer size to some small value (or 0 like in the present case). They still won't get broken before Win8 in this aspect as soon as they call async receive (because network packets will fit in 8192 buffer; while Linux is taking this small buffer size more seriously and starts dropping packets). I am attaching the (very ad-hoc) manual test on top of this patch which I used to testing it over real network. To use it, hardcoded server_ip in the top of tcp_socketpair_flags should be set, then 'ws2_32_test.exe sock' run on the server side and after that 'ws2_32_test.exe sock send' on the client side. I think the test show that over real network nothing is lost as well. [test.patch](/uploads/103764283ca0246cbfae3efaf5d5912c/test.patch)
As for my original test in the patch, I think it shows that without my changes setting short receive buffer length breaks things *even* on a loopback interface and makes no sense (even though the ultimate details may be different on real network interface).
On Fri Sep 1 22:29:01 2023 +0000, Paul Gofman wrote:
As for my original test in the patch, I think it shows that without my changes setting short receive buffer length breaks things *even* on a loopback interface and makes no sense (even though the ultimate details may be different on real network interface).
Sorry, I think I was failing to think through things properly. Let me start over, and try to understand the problem.
So the observation is that Windows *will* drop packets that overflow the buffer size, but internally the buffer size is capped to some minimum value regardless of the value of SO_RCVBUF?
And we know that said minimum value is the same as the initial value of SO_RCVBUF, at least on 8+?
If I understand this all correctly: should we be trying to clamp our internal SO_RCVBUF to that value on Windows, instead of whatever the initial value of Unix SO_RCVBUF happens to be?