Hi all,
please consider the following problem:
On sockets, when POLLHUP is received, it is perfectly legitimate to continue reading data until a read() or recv() call returns 0 bytes.
The current wine implementation calls set_select_events( &sock->obj, -1 ) when POLLHUP or POLLERR is received, disabling any future events from the socket.
However, if an application uses asynchronous IO (or only asynchronous notification via WSAAsyncSelect() or WSAEventSelect()), this also inhibits reception of POLLIN events which can perfectly well occur if there are still unread data in the kernel buffers. Thus the app will never notice that there is still data to be read.
My first reaction to this was to comment out the above call to set_select_events() and continue listening to events from the socket. However, due to the semantics of poll(), this will cause a POLLHUP event on this socket for each and every iteration of the wineserver's main select loop, putting the server basically into a busy-wait state.
How can this be resolved? We seem to have the choice between
a) busy loop in the server (bad) b) discarding data which could be legitimately read by the application (very bad).
Any hints/comments/ideas?
Please note that it is insufficient to queue all pending APCs on the socket when POLLHUP is received, because the app may very well queue some _after_ the close event arrives.
A minor point is that sock_poll_event() also sets a FD_CLOSE event (if present) everytime it is called with POLLHUP set, effectively preventing any APC from being called, at least if the app waits for FD_CLOSE and asynchronous send/recv operations at the same time (likely). This can be worked around by setting FD_CLOSE only once and then blocking it, although I am not sure if that is correct behaviour (permanently signalling FD_CLOSE is obviously incorrect).
I hope someone has a good idea how to resolve this, Martin
My first reaction to this was to comment out the above call to set_select_events() and continue listening to events from the socket. However, due to the semantics of poll(), this will cause a POLLHUP event on this socket for each and every iteration of the wineserver's main select loop, putting the server basically into a busy-wait state.
well, can you just ignore the POLLHUP events in the server for that socket after the first one is received?
Martin Wilck wrote:
My first reaction to this was to comment out the above call to set_select_events() and continue listening to events from the socket. However, due to the semantics of poll(), this will cause a POLLHUP event on this socket for each and every iteration of the wineserver's main select loop, putting the server basically into a busy-wait state.
Could you use the set_select_events to say that you don't want any more POLLHUP events?
On Tue, 16 Apr 2002, Michael Cardenas wrote:
Could you use the set_select_events to say that you don't want any more POLLHUP events?
No you can't - that's how poll() works (at least on Linux).
<man poll> int poll(struct pollfd *ufds, unsigned int nfds, int timeout);
[...]
The field revents is an output parameter, filled by the kernel with the events that actually occurred, either of the type requested, or of one of the types POLLERR or POLLHUP or POLLNVAL. (These three bits are meaningless in the events field, and will be set in the revents field whenever the corresponding condition is true.) </man poll>
Martin
I just might remark that with the two changes I discussed in the previous message (do not call set_select_events (..., -1), signal FD_CLOSE only once) I seem to finally get rid of the problems with the close()/shutdown() semantics that I talked about before.
But only at the cost of very high wineserver CPU usage.
Martin
On Tue, 16 Apr 2002, Martin Wilck wrote:
please consider the following problem:
Sounds like one we've discussed and resolved before.
On sockets, when POLLHUP is received, it is perfectly legitimate to continue reading data until a read() or recv() call returns 0 bytes.
Yes, it's still valid to read, but no more data will arrive on the socket, so the only thing you'll read is data that's already waiting in the kernel buffers. There's no reason to continue actively polling.
The current wine implementation calls set_select_events( &sock->obj, -1 ) when POLLHUP or POLLERR is received, disabling any future events from the socket.
Yes. POLLHUP cannot be masked out, so letting the socket continue to be polled results in a busy-wait and 100% CPU usage.
However, if an application uses asynchronous IO (or only asynchronous notification via WSAAsyncSelect() or WSAEventSelect()), this also inhibits reception of POLLIN events which can perfectly well occur if there are still unread data in the kernel buffers. Thus the app will never notice that there is still data to be read.
Not quite. No more data will arrive on the socket when it's in POLLHUP state, so there's no need for the server to poll. Instead, it notifies the app that there's still data remaining to be read next time it reads from the socket (this is what the "check whether condition is satisfied already" case in sock_reselect does, it explicitly does a poll even if the socket is removed from the main polling loop, in order to check for remaining data). Sending the app new FD_READ events only after it has read old data is a documented Windows feature; further FD_READ events are held back until the app reads old data, and this is what the wineserver implements, especially in the POLLHUP case.
A minor point is that sock_poll_event() also sets a FD_CLOSE event (if present) everytime it is called with POLLHUP set, effectively preventing any APC from being called, at least if the app waits for FD_CLOSE and asynchronous send/recv operations at the same time (likely). This can be worked around by setting FD_CLOSE only once and then blocking it, although I am not sure if that is correct behaviour (permanently signalling FD_CLOSE is obviously incorrect).
The FD_CLOSE is only sent when there's no remaining data to be read (POLLIN is not present in the result of poll). It should not be a problem since the set_select_events( &sock->obj, -1 ) removes the socket from the main poll loop, so no further FD_CLOSEs would be sent.
On Tue, 16 Apr 2002, Ove Kaaven wrote:
Sounds like one we've discussed and resolved before.
Sorry, that was before my times :) Anyway, I think there are open issues yet.
Yes, it's still valid to read, but no more data will arrive on the socket, so the only thing you'll read is data that's already waiting in the kernel buffers. There's no reason to continue actively polling.
Yes but read events must be signalled and/or async reads scheduled after POLLHUP was received.
However, if an application uses asynchronous IO (or only asynchronous notification via WSAAsyncSelect() or WSAEventSelect()), this also inhibits reception of POLLIN events which can perfectly well occur if there are still unread data in the kernel buffers. Thus the app will never notice that there is still data to be read.
Not quite. No more data will arrive on the socket when it's in POLLHUP state, so there's no need for the server to poll. Instead, it notifies the app that there's still data remaining to be read next time it reads from the socket (this is what the "check whether condition is satisfied already" case in sock_reselect does, it explicitly does a poll even if the socket is removed from the main polling loop, in order to check for remaining data).
Hmm - this code went away with CVS version 1.28 of sock.c. Perhaps we need to reintroduce it somehow.
Sending the app new FD_READ events only after it has read old data is a documented Windows feature; further FD_READ events are held back until the app reads old data, and this is what the wineserver implements, especially in the POLLHUP case.
Sorry, I cannot see what you are talking about, not in the current CVS version. All that happens after a read() is issued is _enable_event (FD_READ, 0, 0), and that clears the pending mask for FD_READ. How would that ever be set again, and FD_READ signalled ? (Consider an app that does not read _all_ remaining data after FD_CLOSE but only part of it and then issues further asynchronous read()s).
The key probably is that sock_poll_event() was called from sock_reselect() if the "condition is satisfied", which is no longer the case in 1.28.
Thanks for replying anyway, Martin
On Wed, 17 Apr 2002, Martin Wilck wrote:
However, if an application uses asynchronous IO (or only asynchronous notification via WSAAsyncSelect() or WSAEventSelect()), this also inhibits reception of POLLIN events which can perfectly well occur if there are still unread data in the kernel buffers. Thus the app will never notice that there is still data to be read.
Not quite. No more data will arrive on the socket when it's in POLLHUP state, so there's no need for the server to poll. Instead, it notifies the app that there's still data remaining to be read next time it reads from the socket (this is what the "check whether condition is satisfied already" case in sock_reselect does, it explicitly does a poll even if the socket is removed from the main polling loop, in order to check for remaining data).
Hmm - this code went away with CVS version 1.28 of sock.c. Perhaps we need to reintroduce it somehow.
I can send you the sock.c I have if you wish, but you may be able to check out an earlier revision yourself.
Sending the app new FD_READ events only after it has read old data is a documented Windows feature; further FD_READ events are held back until the app reads old data, and this is what the wineserver implements, especially in the POLLHUP case.
Sorry, I cannot see what you are talking about, not in the current CVS version. All that happens after a read() is issued is _enable_event (FD_READ, 0, 0), and that clears the pending mask for FD_READ. How would that ever be set again, and FD_READ signalled ?
req->mask is here FD_READ, and the handler contains
sock->pmask &= ~req->mask; sock->hmask &= ~req->mask; sock->state |= req->sstate; sock->state &= ~req->cstate; sock_reselect( sock );
where sock_reselect includes this at the end
/* check whether condition is satisfied already */ pfd.fd = sock->obj.fd; pfd.events = ev; pfd.revents = 0; poll( &pfd, 1, 0 ); if (pfd.revents) sock_poll_event( &sock->obj, pfd.revents);
result: sock_poll_event sets FD_READ flag and notifies the application in response to the _enable_event, i.e. after it has read something, even if the socket is not in the main poll loop. We never got around to change the comment to explain this additional function.
The key probably is that sock_poll_event() was called from sock_reselect() if the "condition is satisfied", which is no longer the case in 1.28.
If this code path was removed, then you'll probably have to put it back in.
On Wed, 17 Apr 2002, Ove Kaaven wrote:
I can send you the sock.c I have if you wish, but you may be able to check out an earlier revision yourself.
Yeah, I checked out 1.27 and am stuying it right now.
If this code path was removed, then you'll probably have to put it back in.
I guess I need to do some more things for the overlapped IO, but basically you are right.
I have 2 questions:
1. In the code you're talking about, there is a potential mutual recursion because sock_get_poll_events calls sock_reselect, which may in turn call sock_get_poll_events. I understand that infinite recursion is prevented by handling the sock->hmask field, but I wonder whether it is necessary at all that sock_get_poll_events calls sock_reselect() and not only set_select_events(sock_get_poll_events()) ? I'd prefer a solution where no call sequence sock_get_poll_events->sock_reselect->sock_get_poll_events->... would be possible.
2. IMO, the code snippet
if (event & POLLIN) { char dummy; /* Linux 2.4 doesn't report POLLHUP if only one side of the socket * has been closed, so we need to check for it explicitly here */ if (!recv( sock->obj.fd, &dummy, 1, MSG_PEEK )) event = POLLHUP; } [...]
if (event & (POLLERR|POLLHUP)) set_select_events( &sock->obj, -1 );
is wrong, because an empty read only signals that the upstream connection was closed. The downstream connection may still be open, i.e. FD_WRITE events may well occur. In this case it seems to be wrong to do set_select_events( &sock->obj, -1 ) _unless_ the kernel had signalled POLLHUP|POLLIN, which means that both sides of the socket have been closed. Thus, if we have an empty read but no POLLHUP from the kernel, we should signal FD_CLOSE to the app, but not pretend we had got POLLHUP as the current code does.
Right?
Martin
On Wed, 17 Apr 2002, Martin Wilck wrote:
- In the code you're talking about, there is a potential mutual recursion
because sock_get_poll_events calls sock_reselect, which may in turn call sock_get_poll_events.
sock_get_poll_events does not call sock_reselect in the code I'm referring to, I see no reason it should. Its sole purpose is to return the event mask that should be polled next, it should not effectuate this mask. Or are you talking about something else? Hmm, sounds like you may be talking about sock_poll_event... ok, let's see...
I understand that infinite recursion is prevented by handling the sock->hmask field, but I wonder whether it is necessary at all that sock_get_poll_events calls sock_reselect() and not only set_select_events(sock_get_poll_events()) ?
The event mask to check for may change during sock_poll_event and it would be nice to check for those new events immediately. But at the moment I don't recall any situation where it would be necessary.
I'd prefer a solution where no call sequence sock_get_poll_events->sock_reselect->sock_get_poll_events->... would be possible.
IMO, the code snippet
if (event & POLLIN) { char dummy; /* Linux 2.4 doesn't report POLLHUP if only one side of the socket * has been closed, so we need to check for it explicitly here */ if (!recv( sock->obj.fd, &dummy, 1, MSG_PEEK )) event = POLLHUP; }
[...]
if (event & (POLLERR|POLLHUP)) set_select_events( &sock->obj, -1 );
is wrong, because an empty read only signals that the upstream connection was closed.
Hmm. I don't think I wrote that code piece, so I don't recall its justification. It may well be that it was ill-conceived (at least I don't quite like the looks of that code), especially since POLLHUP is the "problem" we tried to solve with all this insidious code, not its "solution". There used to be different code to detect a closed socket, but it seems it was removed for some reason (probably because it had other issues), and then maybe that thing above was hacked on top because the old code was no longer there?
The downstream connection may still be open, i.e. FD_WRITE events may well occur. In this case it seems to be wrong to do set_select_events( &sock->obj, -1 ) _unless_ the kernel had signalled POLLHUP|POLLIN, which means that both sides of the socket have been closed. Thus, if we have an empty read but no POLLHUP from the kernel, we should signal FD_CLOSE to the app, but not pretend we had got POLLHUP as the current code does.
Right?
Yeah.
On Wed, 17 Apr 2002, Ove Kaaven wrote:
are you talking about something else? Hmm, sounds like you may be talking about sock_poll_event... ok, let's see...
Of course I meant poll_event. Well I am a bit confused ... :-)
The event mask to check for may change during sock_poll_event and it would be nice to check for those new events immediately. But at the moment I don't recall any situation where it would be necessary.
I am currently trying to split the sock_reselect functionality and introduce a separate function that takes over the checking if the socket has been kicked out of main the select loop.
I just need to figure out where exactly I need to call that function. I am also not sure whether any combination of the FD_... and FD_WINE... flags suffices to tell if the socket is in the select loop or not, because of the many different situations that may affect these flags.
Even after working on this code for a few months now, I still find the management of the socket state flags and various masks pretty confusing. But I'm getting there.
Martin
Dear Ove -
thanks a lot for your comments & explanations. It seems that I will finally be able to present an implementation of overlapped IO on sockets that works.
Martin
Ove,
can you comment on this line in sock_reselect:
/* previously unconnected socket, is this reselect supposed to connect it? */ if (!(sock->state & ~FD_WINE_NONBLOCKING)) return 0;
Was this a short way of expressing
if ( !(sock->state & (FD_READ|FD_WRITE|FD_OOB|FD_ACCEPT|FD_CONNECT|FD_CLOSE| FD_WINE_LISTENING|FD_WINE_CONNECTED|FD_WINE_RAW) ))
??
Rationale: I have been adding new flags (FD_WINE_SEND_SHUTDOWN, FD_WINE_RECV_SHUTDOWN) and my guess is they (being internal) should be treated like FD_WINE_NONBLOCKING here, but I'm uncertain.
Martin
On Thu, 18 Apr 2002, Martin Wilck wrote:
can you comment on this line in sock_reselect:
/* previously unconnected socket, is this reselect supposed to connect it? */ if (!(sock->state & ~FD_WINE_NONBLOCKING)) return 0;
Was this a short way of expressing
if ( !(sock->state & (FD_READ|FD_WRITE|FD_OOB|FD_ACCEPT|FD_CONNECT|FD_CLOSE| FD_WINE_LISTENING|FD_WINE_CONNECTED|FD_WINE_RAW) ))
??
Sort of. The FD_WINE_NONBLOCKING is the only flag that doesn't mean that the socket should enter the poll loop. (Putting a newly created (unused) connection-oriented (TCP) socket into the poll loop means POLLHUP on Linux 2.4, which is obviously bad.)
FD_READ | FD_WRITE | FD_OOB: the socket is ready for data traffic (datagram socket or connected stream socket), should be polled FD_ACCEPT: the socket is listening, should be polled FD_CONNECT: connection is in progress, should be polled FD_CLOSE: not used, but would have meant should be polled for closes FD_WINE_LISTENING: same as FD_ACCEPT FD_WINE_CONNECTED: connected socket, should be polled FD_WINE_RAW: a raw socket is a datagram socket, always pollable
FD_WINE_NONBLOCKING: API control flag, not a socket state, should not affect polling
Rationale: I have been adding new flags (FD_WINE_SEND_SHUTDOWN, FD_WINE_RECV_SHUTDOWN) and my guess is they (being internal) should be treated like FD_WINE_NONBLOCKING here, but I'm uncertain.
If these are API control flags (like the nonblocking flag) and it should be able to set them without putting the socket into the main poll loop, then you should treat them like that, but if the flags convey state and can only occur on already-connected sockets, then there's no reason to check for them.