When two sends are performed in parallel on a socket the send which came second to server's (socked_send) will always considered blocking if the first send hasn't performed set_async_direct_result() in client's ntdll yet. Similarly, if select() is coming shortly after send() it may report socket as not write ready. We don't care for avoiding a race here because if those two sends or send / select are not synced by app it is unlikely possible for an app to depend on which gets processed first, and the one which was called earlier in wall time now also is not guaranteed to be actually processed first.
Yeah. I'd put it in even stronger terms, actually. The race that's being fixed here is due to the gap between send_socket and set_async_direct_result, which simply isn't a gap that exists on Windows. It all happens inside of NtDeviceIoControlFile(); there are no synchronization points that exist.
I wish dearly that there were a way to introduce a synchronous model for server calls. Asyncs are far, far too complex, and the price of that complexity is quite evident to me.