On Thu, Aug 11, 2022, 3:02 AM Zebediah Figura (she/her) < zfigura@codeweavers.com> wrote:
Doesn't this mean that we can get POLLOUT from the host but then be unable to write data? That sounds like a spec violation.
Unset POLLOUT does not necessarily imply that sendmsg() will block. On Linux (as of v5.19), POLLOUT is signaled on a connected TCP socket (that has not shut down) only if sk_wmem_queued is at least two thirds of sk_sndbuf (see __sk_stream_is_writeable). In contrast, sendmsg() will happily accept however much buffer space is left.
This is seemingly related to why Linux will double whatever SO_SNDBUF value you set to the socket: Linux stores the bookkeeping data in the same buffer as the application data, so it needs to raise the "writability" threshold. Also, memory pressure does not occur often, so perhaps it's seemingly not a problem in practice. However, I agree that it's not ideal that sendmsg() could block even after POLLOUT has been signaled. I'll test again with (P)MTU discovery disabled.
That said, it looks like TCP retransmission does not actually result in increase of `sk_wmem_queued`. I'll edit accordingly in the next revision.