Based on a finding here: https://github.com/ValveSoftware/Proton/issues/6709#issuecomment-1626059391
On Windows, small (or 0) receive buffer works a bit differently when async IO is concerned. It is supposed to bypass some system buffering and receive the data directly to the user buffer if async receive is pending. So the apps may be using 0 buffer size as a small optimization to skip extra buffer copies in the system. On Unix, if the buffer is too small to receive the network packet the packet will just be dropped. As I interpret my tests though, Windows doesn't actually drop the packets with async socket even if there is no receive currently pending (looking like setting smaller buffer have no effect at all).
This behaves differently with synchronous IO, Windows can also drop packets which don't fit the smaller set buffer in this case. I think that we can just never set receive buffers smaller than the default one on Unix sockets. The only effect of actually setting that which I can think of is that the system starts dropping packets more eagerly. Which is wrong for async socket I/O case. That might be not so wrong for sync socket I/O case, but implementing that specific to I/O type will add a lot of complication and will only lead that we will dropping packets more eagerly. While we are probably still won't be doing it exactly like on Windows as it depends on multiple factors in Unix and Windows network stack.
-- v2: server: Don't set SO_RCVBUF below Windows default value on Unix socket.
From: Paul Gofman pgofman@codeweavers.com
--- dlls/ws2_32/tests/sock.c | 59 +++++++++++++++++++++++++++++++++++++++- server/sock.c | 14 ++++++++-- 2 files changed, 70 insertions(+), 3 deletions(-)
diff --git a/dlls/ws2_32/tests/sock.c b/dlls/ws2_32/tests/sock.c index a1fefb45d1e..31dff1a11c7 100644 --- a/dlls/ws2_32/tests/sock.c +++ b/dlls/ws2_32/tests/sock.c @@ -161,6 +161,7 @@ static GUID WSARecvMsg_GUID = WSAID_WSARECVMSG;
static SOCKET setup_server_socket(struct sockaddr_in *addr, int *len); static SOCKET setup_connector_socket(const struct sockaddr_in *addr, int len, BOOL nonblock); +static int sync_recv(SOCKET s, void *buffer, int len, DWORD flags);
static void tcp_socketpair_flags(SOCKET *src, SOCKET *dst, DWORD flags) { @@ -1225,7 +1226,7 @@ static void test_set_getsockopt(void) {AF_INET6, SOCK_DGRAM, IPPROTO_IPV6, IPV6_UNICAST_HOPS, TRUE, {1, 1, 4}, {0}, FALSE}, {AF_INET6, SOCK_DGRAM, IPPROTO_IPV6, IPV6_V6ONLY, TRUE, {1, 1, 1}, {0}, TRUE}, }; - SOCKET s, s2; + SOCKET s, s2, src, dst; int i, j, err, lasterr; int timeout; LINGER lingval; @@ -1237,6 +1238,7 @@ static void test_set_getsockopt(void) int expected_err, expected_size; DWORD value, save_value; UINT64 value64; + char buffer[4096];
struct _prottest { @@ -1302,6 +1304,61 @@ static void test_set_getsockopt(void) ok( !err, "getsockopt(SO_RCVBUF) failed error: %u\n", WSAGetLastError() ); ok( value == 4096, "expected 4096, got %lu\n", value );
+ value = 0; + size = sizeof(value); + err = setsockopt(s, SOL_SOCKET, SO_RCVBUF, (char *)&value, size); + ok( !err, "setsockopt(SO_RCVBUF) failed error: %u\n", WSAGetLastError() ); + value = 0xdeadbeef; + err = getsockopt(s, SOL_SOCKET, SO_RCVBUF, (char *)&value, &size); + ok( !err, "getsockopt(SO_RCVBUF) failed error: %u\n", WSAGetLastError() ); + ok( value == 0, "expected 0, got %lu\n", value ); + + /* Test non-blocking receive with too short SO_RCVBUF. */ + tcp_socketpair(&src, &dst); + set_blocking(src, FALSE); + set_blocking(dst, FALSE); + + value = 0; + size = sizeof(value); + err = setsockopt(src, SOL_SOCKET, SO_SNDBUF, (char *)&value, size); + ok( !err, "got %d, error %u.\n", err, WSAGetLastError() ); + + value = 0xdeadbeef; + err = getsockopt(dst, SOL_SOCKET, SO_RCVBUF, (char *)&value, &size); + ok( !err, "got %d, error %u.\n", err, WSAGetLastError() ); + if (value >= sizeof(buffer) * 3) + { + value = 1024; + size = sizeof(value); + err = setsockopt(dst, SOL_SOCKET, SO_RCVBUF, (char *)&value, size); + ok( !err, "got %d, error %u.\n", err, WSAGetLastError() ); + value = 0xdeadbeef; + err = getsockopt(dst, SOL_SOCKET, SO_RCVBUF, (char *)&value, &size); + ok( !err, "got %d, error %u.\n", err, WSAGetLastError() ); + ok( value == 1024, "expected 0, got %lu\n", value ); + + err = send(src, buffer, sizeof(buffer), 0); + ok(err == sizeof(buffer), "got %d\n", err); + err = send(src, buffer, sizeof(buffer), 0); + ok(err == sizeof(buffer), "got %d\n", err); + err = send(src, buffer, sizeof(buffer), 0); + ok(err == sizeof(buffer), "got %d\n", err); + + err = sync_recv(dst, buffer, sizeof(buffer), 0); + ok(err == sizeof(buffer), "got %d, error %u\n", err, WSAGetLastError()); + err = sync_recv(dst, buffer, sizeof(buffer), 0); + ok(err == sizeof(buffer), "got %d, error %u\n", err, WSAGetLastError()); + err = sync_recv(dst, buffer, sizeof(buffer), 0); + ok(err == sizeof(buffer), "got %d, error %u\n", err, WSAGetLastError()); + } + else + { + skip("Default SO_RCVBUF %lu is too small, skipping test.\n", value); + } + + closesocket(src); + closesocket(dst); + /* SO_LINGER */ for( i = 0; i < ARRAY_SIZE(linger_testvals);i++) { size = sizeof(lingval); diff --git a/server/sock.c b/server/sock.c index b63412ab216..ac55a5448f7 100644 --- a/server/sock.c +++ b/server/sock.c @@ -197,6 +197,8 @@ struct bound_addr
#define MAX_ICMP_HISTORY_LENGTH 8
+#define MIN_RCVBUF 65536 + struct sock { struct object obj; /* object header */ @@ -1916,7 +1918,14 @@ static int init_socket( struct sock *sock, int family, int type, int protocol )
len = sizeof(value); if (!getsockopt( sockfd, SOL_SOCKET, SO_RCVBUF, &value, &len )) + { + if (value < MIN_RCVBUF) + { + value = MIN_RCVBUF; + setsockopt( sockfd, SOL_SOCKET, SO_RCVBUF, &value, sizeof(value) ); + } sock->rcvbuf = value; + }
len = sizeof(value); if (!getsockopt( sockfd, SOL_SOCKET, SO_SNDBUF, &value, &len )) @@ -3064,7 +3073,7 @@ static void sock_ioctl( struct fd *fd, ioctl_code_t code, struct async *async )
case IOCTL_AFD_WINE_SET_SO_RCVBUF: { - DWORD rcvbuf; + DWORD rcvbuf, set_rcvbuf;
if (get_req_data_size() < sizeof(rcvbuf)) { @@ -3072,8 +3081,9 @@ static void sock_ioctl( struct fd *fd, ioctl_code_t code, struct async *async ) return; } rcvbuf = *(DWORD *)get_req_data(); + set_rcvbuf = max( rcvbuf, MIN_RCVBUF );
- if (!setsockopt( unix_fd, SOL_SOCKET, SO_RCVBUF, (char *)&rcvbuf, sizeof(rcvbuf) )) + if (!setsockopt( unix_fd, SOL_SOCKET, SO_RCVBUF, (char *)&set_rcvbuf, sizeof(set_rcvbuf) )) sock->rcvbuf = rcvbuf; else set_error( sock_get_ntstatus( errno ) );
On Mon Sep 11 21:41:23 2023 +0000, Zebediah Figura wrote:
Sorry, I think I was failing to think through things properly. Let me start over, and try to understand the problem. So the observation is that Windows *will* drop packets that overflow the buffer size, but internally the buffer size is capped to some minimum value regardless of the value of SO_RCVBUF? And we know that said minimum value is the same as the initial value of SO_RCVBUF, at least on 8+? If I understand this all correctly: should we be trying to clamp our internal SO_RCVBUF to that value on Windows, instead of whatever the initial value of Unix SO_RCVBUF happens to be?
We also had some chat with Zeb to discuss this. To summarize: - saying that Windows (or Linux) is dropping packets in case of TCP is not quite correct, it depends on what server as doing which is retransmitting the packet essentially blocking further send; - the other question raised was whether _RCVBUF does anything at all on Windows (e. g., shouldn't we just always set it to 64k);
I designed an ad-hoc test which tries to measure the effect of Windows rcvbuf in the following way: - the receiving part (Windows) sets SO_RCVBUF being tested and waits for 2sec after accept before calling receive; - sending part (Linux_ tries to send as much data as possible during ~300ms calling async send; - then receiving part wakes and receives everything, prints how much data actually received.
So testing like that confirms that setting values below 64k (up to date Windows default reported RCVBUF) doesn't seem to change anything, while increasing the value above does increase amount of data recevied this way. 64k is probably not random, it is about maximum UDP packet size and TCP fragment size.
So I think bottom line of this is: - avoiding setting short _RCVBUF still makes sense; - minding Unix system default probably doesn't, we can better just use 64k minimum; - we still need to set _RCVBUF for bigger values, it affects thing (in somewhat similar way on Windows and Unix).
I've sent an updated patch.
On Tue Sep 12 16:50:56 2023 +0000, Paul Gofman wrote:
We also had some chat with Zeb to discuss this. To summarize:
- saying that Windows (or Linux) is dropping packets in case of TCP is
not quite correct, it depends on what server as doing which is retransmitting the packet essentially blocking further send;
- the other question raised was whether _RCVBUF does anything at all on
Windows (e. g., shouldn't we just always set it to 64k);
I designed an ad-hoc test which tries to measure the effect of Windows rcvbuf in the following way:
- the receiving part (Windows) sets SO_RCVBUF being tested and waits for
2sec after accept before calling receive;
- sending part (Linux_ tries to send as much data as possible during
~300ms calling async send;
- then receiving part wakes and receives everything, prints how much
data actually received. So testing like that confirms that setting values below 64k (up to date Windows default reported RCVBUF) doesn't seem to change anything, while increasing the value above does increase amount of data recevied this way. 64k is probably not random, it is about maximum UDP packet size and TCP fragment size. So I think bottom line of this is:
- avoiding setting short _RCVBUF still makes sense;
- minding Unix system default probably doesn't, we can better just use
64k minimum;
- we still need to set _RCVBUF for bigger values, it affects thing (in
somewhat similar way on Windows and Unix). I've sent an updated patch.
As a separate note (not quite related here), spotted while testing this, SO_SNDBUF with 0 size also behaves quite different on Windows compared to what we have now. For one, on Windows setting SO_SNDBUF to 0 makes send() always block even if socket is created with WSA_FLAG_OVERLAPPED and FIONBIO is set. I am not aware currently of any app depending on that but I guess it is possible.
This merge request was approved by Zebediah Figura.