On 3/5/22 01:52, Jinoh Kang wrote:
On 3/5/22 09:13, Zebediah Figura wrote:
On 3/4/22 13:04, Jinoh Kang wrote:
On 3/5/22 02:47, Zebediah Figura wrote:
Right, but I think it's possible to trigger this assert now. For a rather pathological case, consider a client which calls set_async_direct_result on an async which is currently being serviced by a device driver and has not yet reached get_next_device_request.
For that to be possible, something has to terminate the async so that set_async_direct_result does not reject it. The easiest way to do so is to cancel it, or make it timeout somehow.
Okay, I don't think that's possible for a device async. I'm still more comfortable removing the assert in this patch, though; I'm not sure that there isn't a way to wrangle a server crash...
Understood.
In fact, I think we have discovered an _existing_ bug: the client can use set_async_direct_result to interfere with cancelled asyncs (device-backed or not).
Sure, but the client is basically considered free to sabotage itself.
My concern was that it might interfere with the driver's assumptions; looks like it shouldn't happen, though.
We just can't let it crash the server, or mess with an object that it doesn't have the right access bits to.
Further investigation revealed some other bugs, which is tangentially related:
- An async may time out before the client calls set_async_direct_result to report back the status of initial I/O. In this case, the timeout is ignored.
I don't think this is a problem /a priori/, though.
The problem is that a timeout ignored *once* is ignored *forever* throughout the async's lifetime. We don't restore the timeout at all once the async enters the pending state.
An example scenario is as follows:
- Client calls recv_socket.
- Server sets async timeout and returns.
- The timeout is triggered, but the async is already in terminated state; thus, it is ignored.
- Client calls set_async_direct_result with status = STATUS_PENDING.
- Now, the async will potentially linger forever now that the timeout has already been expired.
Ah, right. Unless I'm mistaken that's also a preëxisting problem.
In fact, I'm somewhat concerned that if we let synchronous I/O time out before it even had a chance to start, we would break applications that expect it to work.
- There is a time window where alerted asyncs ignore IoCancel/IoCancelEx requests. If async_terminate( async, STATUS_ALERTED ) is called, the async will ignore any cancellation requests until the APC_ASYNC_IO routine returns.
Note that this bug has existed *before* set_async_direct_result, and I think this one may need to be addressed first.
Yes, this has been a bug for a while. I don't remember if this was a problem before the async rework in 7.0. We probably need to "buffer" the cancellation request somehow, and process it once the async is restarted. (As above, I don't think we want to cancel the async if the client is going to report success to us.)
Sounds like a great idea. Not sure which one is the best way to implement it, though. A few options come to my mind:
- Allow async_terminate() to override status if async->alerted is set, even if it is already terminated.
We used to have this (before a5b6e90d48e), but I don't think this actually helps. If the client restarts we're back where we started. We could account for it there but I don't much like the added complexity.
- Add a separate status field for buffering cancellation requests.
This seems more desirable to me. We'll need to do the same thing with timeouts.
So what I think we need here is a mechanism to tell alerted asyncs apart from irreversibly-terminated asyncs. This will solve all of the problems above:
- It's no longer possible to call set_async_direct_result on a device async in the first place, since it will now reject irreversibly-terminated asyncs while still accepting alerted asyncs (which is not possible for device asyncs).
- Timeout and cancellation works as expected: set_async_direct_result will treat timed out asyncs as irreversibly-terminated. (the client needs to be modified to handle this case explicitly, though.)
Any thoughts?
Maybe I'm missing something, but isn't that what the "alerted" field is for?
You're right. Sorry for missing that. Also, the cancellation buffering approach you suggested makes this unnecessary anyway.