On 10/25/22 19:27, Paul Gofman wrote:
On 10/25/22 19:04, Zebediah Figura (@zfigura) wrote:
Of course, there's a good reason for TIME_WAIT existing, but, well, Windows :-/
Do you know which is the reason it exists on Linux on the listening port with the same timeout value (it is clear why it is on accepted socket / port)? I could not deduce that from specs. The only mention of that I know about is BSD _REUSEADDR manual page which suggests that some implied timeout might be there, and that the Linux kernel keeps listen port busy for the same time as accept port and not going to change that. I am not entirely sure that Windows is violating any specific TCP rule by relaxing the timeout on listening port.
I don't really know at this point, no. I tried to figure that out but couldn't even make sense of the regular TCP specs or state diagram, and at this point I think it probably makes sense just to manually implement SO_REUSEADDR and TIME_WAIT semantics rather than trying to change every POSIX environment anyway.
Should we add tests for the TIME_WAIT part as well?
I don't know how to do that 100% reliably for a non-flaky test, the way I use locally is creating child processes which connect to each other and killing them. But that results in different TCP connection states on ports and sometimes (rarely) they may get lucky and not hit the longest timeout. Should I maybe attach a local program which reproduces that?
Is it not sufficient to close sockets in-process?
I think in general there's value in tests that don't always fail without a fix, provided that they always succeed with it—i.e. they still can help prevent regressions—but in this case maybe it's not worth it if the test gets too ugly (or if the relevant case is too rarely hit).