https://bugs.winehq.org/show_bug.cgi?id=54413
Bug ID: 54413 Summary: ws2_32:sock - DuplicateHandle(socket) sometimes look like a socket in test_WSAGetOverlappedResult() on Windows Product: Wine Version: unspecified Hardware: x86-64 OS: Windows Status: NEW Severity: normal Priority: P2 Component: winsock Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com
ws2_32:sock - DuplicateHandle(socket) sometimes look like a socket in test_WSAGetOverlappedResult() on Windows:
sock.c:12512: Test failed: got 1. sock.c:12513: Test failed: got 0.
See https://test.winehq.org/data/patterns.html#ws2_32:sock
This failure is pretty rare, there are only 4 known instances: * 2022-09-27 on w1064 64 bit (21h2) * 2022-11-15 on w10pro64-hi-u8 64 bit (21h1) * 2023-01-25 on w11pro64 64 bit (21h2) * 2023-01-25 on w1064v2009 64 bit
The test creates a socket, treats it as a handle and duplicates it, then checks whether the duplicate handle behaves like a socket which it normally does not, causing WSAGetOverlappedResult() to fail (return 0) and WSAGetLastError() to be set to WSAENOTSOCK.
But in these four cases WSAGetOverlappedResult() succeeded which would indicate the duplicate handle looked like a socket?
https://bugs.winehq.org/show_bug.cgi?id=54413
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |source, testcase
https://bugs.winehq.org/show_bug.cgi?id=54413
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com
--- Comment #1 from Zeb Figura z.figura12@gmail.com --- That is extremely weird. I have no idea how there could even be a race here...
https://bugs.winehq.org/show_bug.cgi?id=54413
--- Comment #2 from François Gouget fgouget@codeweavers.com --- Yes. It's as if sockets and handles are in separate namespaces and we sometimes get a collision. But if that were the case DuplicateHandle() should never work on sockets so that hypothesis makes no sense.
The Microsoft documentation does say one should not use DuplicateHandle() on sockets, mostly because "the duplicate handle may not be recognized by Winsock at the target process" which is not the issue here. It's strange how this seems to suggest DuplicateHandle() of sockets works but is unreliable. And more importantly it does not say why it would be unreliable.
The documentation also says "DuplicateHandle interferes with internal reference counting on the underlying (socket?) object". But I don't see how this could be the issue here.
It does suggest our test is doing something it shouldn't. But then testing corner cases is not not uncommon...
https://bugs.winehq.org/show_bug.cgi?id=54413
--- Comment #3 from Zeb Figura z.figura12@gmail.com --- Socket handles *are* NT handles, which is why you can duplicate them. The way it actually (seems to) work is that they have some in-process data as well. I'm not sure what this in-process data comprises—we don't store any—but it may simply have to do with layered service providers (which we don't implement).
Hence if you just create an NT socket handle in a target process, ws2_32 won't know about it, and will conclude it's not a socket when trying to look it up. In order to tell ws2_32 about it you're supposed to use WSADuplicateSocket() instead of DuplicateHandle() and then import it on the other end with WSASocket(). (Internally WSADuplicateSocket() seems to just call DuplicateHandle() and stash the socket handle in the returned struct.)
Which all makes this test failure very confusing—how did ws2_32 find out about this socket?
https://bugs.winehq.org/show_bug.cgi?id=54413
--- Comment #4 from Zeb Figura z.figura12@gmail.com --- Ah, I have a theory.
Because ws2_32 must have an internal socket list (which is effectly exactly what this test proves), it stands to reason that sockets must be removed from that list in closesocket().
That means that if you close the NT handle directly, via CloseHandle() or NtClose(), that per-process data won't be deleted, and a following call to a ws2_32 API will see that it's still there (and, for something like WSAGetOverlappedResult(), it *only* seems to be checking that it's a valid socket handle; it doesn't actually need to do anything with the handle, at least not if there's an event).
Testing this confirms that: create a socket, close it with CloseHandle(), then duplicate another socket into the same process and call WSAGetOverlappedResult() with the duplicate. The handle values will basically always be the same [it's known that Windows uses a free list for its handles], and as a result WSAGetOverlappedResult() will succeed.
As for where we're using CloseHandle() or NtClose(): that's iocp_async_read_closesocket(), which does indeed execute before this test.
As for why this failure is intermittent: Windows sometimes just randomly opens or closes process handles, I guess in internal threads. That's something that I encountered while testing that handles are allocated from a free list.
Fixing this is... tricky. Here's the problem: It's actually perfectly possible to delete the leftover socket data by calling closesocket(). That function must obviously call NtClose() / CloseHandle() internally, but it evidently doesn't check the return value, because it really does delete the socket data and return success, and a following call to WSAGetOverlappedResult() does return ENOTSOCK with the same handle.
So in theory we could just closesocket() the handle in iocp_async_read_closesocket(). What makes me nervous about doing that is—what if we get caught in a race in that very function, and a Windows thread allocates a new handle between the CloseHandle() and the subsequent closesocket()? closesocket() would end up closing a random internal NT handle, and that could easily cause other obscure failures down the line.
Safer is probably just to pass WSAGetOverlappedResult() a garbage handle value, one that would in practice never be allocated.
https://bugs.winehq.org/show_bug.cgi?id=54413
--- Comment #5 from Zeb Figura z.figura12@gmail.com --- (In reply to Zeb Figura from comment #4)
Safer is probably just to pass WSAGetOverlappedResult() a garbage handle value, one that would in practice never be allocated.
Actually we already do that. This test exists because it does, I think, have value—we want to show that it's not enough for the handle to be valid, it has to be a handle that ws2_32 knows about. I don't think there's another way to prove that, we need to use DuplicateHandle().
So instead I sent a patch that reorders the two tests, and puts the more "destructive" iocp_async_read_closesocket() at the end.
Hopefully that's enough to make this failure go away...
https://bugs.winehq.org/show_bug.cgi?id=54413
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Fixed by SHA1| |8ab7120badb68519eb3a76ad676 | |8c3109454e2b0 Resolution|--- |FIXED Status|NEW |RESOLVED
--- Comment #6 from Zeb Figura z.figura12@gmail.com --- Should hopefully be fixed by https://source.winehq.org/git/wine.git/commitdiff/8ab7120badb68519eb3a76ad6768c3109454e2b0. Since it's rare in the first place, let's assume fixed, and reopen if we see this happening again.
https://bugs.winehq.org/show_bug.cgi?id=54413
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #7 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 8.14.