Mike McCormack wrote:
Just got your last mail regarding races... From what I can see, it should now behave the same way as select_loop().
No, it does not. Sorry.
When using poll, the important field we look at is revents. Let's recap the problem: 1. epoll is called, and marks users at offsets 1 2 and 3 as having interesting events to handle. 2. The handling of event 1 removed user 3 from the list. fd is deallocated, and entry is cleared. node 3 is added to the beginning of the free nodes list. 3. Event 2 is handled. During that handling it asks for a new node. Since 3 is at the beginning of the free list, that's what it gets. It sets a new fd for it, and ask it to wait on certain unrelated events. 4. The loop handling the epoll events arrive at event 3. Had that been the "poll" loop, we would be looking at the revents field. That field was cleared by the removal at step 2, and as no call to poll happened since it was re added in 3, we would correctly surmise that there is nothing interesting to be done for this user, and move on. We would be inefficient, as the counter of handled users would not reach zero, and we would have to scan the entire list. However, we would not perform incorrect processing.
However, with your patch, things are different. The "revents" equivalent is stored in an array dedicated to the epoll results, and it is impossible for the del-user function to clear it. We do check that "Events" is not zero, but it's not. We therefor think that the events flagged for the old occupant of user #3 actually belong to the new occupant, and we handle it incorrectly.
Shachar