Re: [PATCH 0/6] MR6630: ntdll, server: Fix various completion port scheduling issues.

22 Oct 2024


      ...
1/6:
It seems weird that we're keeping the "wait" object associated with an IOCP even when we're not waiting on it. That is, I'd expect us to remove the completion_wait from an IOCP as soon as a remove_completion call succeeds, or when the wait times out. I guess it works because wake_up() won't actually do anything if a thread isn't queued on its completion_wait, but it doesn't seem very declarative. Or maybe we need to do it like this for some reason I'm not thinking of?
A bit unrelated, but the association of thread to IOCP is a Windows concept described in [1].
Yes, it is directly needed for 3/6, also (while I didn't discover all the inobvious details of that and didn't attempt implementation) limiting "active" thread count has something to do with it.
Besides, I think not caring when a thread is actually waiting on IOCP wait object or not is a great deal of simplication here, leaving all the logic in generic wineserver wait queue mechanics. This way it is done now it doesn't have to hook various wait related events and know whether thread is waiting, or timed out, or got suspended before or while waiting, all that is handled in a generic wait logic. The logic is:
1. After (remove_completion) pending status the thread gets wait handle. It is free to wait on it forever or timeout.
2. As soon as thread got wait satisfied, we think on the server that completion is assigned it. The thread is normally supposed to get its completion with get_thread_completion. It may also be suspended before that and get it after resume, or it may be force terminated and that completion will be lost, none of that should be a problem.
3. If the thread timed out waiting on completion wait, the thread is not in server wait queue on this object and doesn't have completion set to it. There is no difference if thread has previously successfully removed completion or timed out, it is associated with completion port either way and not in wait queue on this object. Consequent call to (remove_completion) on the same completion port without completion ready will reuse the same wait object and handle without changing anything.
I am under impression that there is nothing wrong in relying on wait_satisfied() behaviour, a few sync objects do that.
...
Why do we keep destroying and recreating the completion_wait? I'd expect us to either destroy it immediately when it's not needed (cf. the above question), or keep it around for the lifetime of the thread and simply retarget it to a different IOCP.
Well, I thought it is a bit simplier this way. Completion wait is associated with (thread, IOCP) pair since creation until destroy, there is no transition, just creation and universal cleanup. But also it is not hard to retarget to a different IOCP in (remove_completion), I will do it this way.
...
In completion_destroy() you set wait->completion to NULL, which I believe will prevent the list_remove() in cleanup_thread_completion(), which doesn't look right. [Cheating and looking ahead, that's fixed in 4/6.]
Yes, it seems like it is not needed (while also probably not harmful). I will remove it. It got there during splitting the patches. Note that Patch 4/6 has this in _close_handle(), this is needed to "detach" wait object from completion (on closing port handle both direct and "internal" waits should be satisfied, while there is no obvious need to keep completion object around and complicate things by referencing it from wait objects).
...
2/6:
...
The subject of this patch suggests that it fixes the bug you mention (where posting a packet and then immediately closing the port often fails with STATUS_INVALID_HANDLE) but according to my reading that's actually fixed in 4/6 [whose title would not suggest this]. I'd advocate for fixing the split for the sake of future archaeology, but I may be overruled on this concern.
It seems to me 4/6 fixes a different bug? There are two of them (not exactly similar):
- A thread is waiting on completion port, either directly with WaitFor... from the app or from NtRemoveIoCompletion (now on separate object). The last handle to the port is closed (without any completion available on port). Currently the waits will hang forever (or until timeout, if any). This is fixed by 4/6.
- A thread is waiting in NtRemoveIoCompletion[Ex](). Another thread posts completion and closes handle. Both currently and with these patches the waiting thread will be woken by the posted completion. But currently attempting to get the completion will fail because port handle is already invalid. This is solved by 2/6.
...
3/6:
...
Ah, so that explains why we need to keep the thread associated. I'm mildly curious if that holds if we wait on a different IOCP instead of the same one?
I just tested that, associating with different IOCP before that doesn't change anything (as I would expect). Also, getting APC NtRemoveIoCompletionEx port from not yet associated port doesn't make the association. I will include these in tests.
...
Why the "ok(count <= 1)"? Does this differ between Windows versions, and, if so, can we document this in the code please?
Yes, it is older Windows behaviour (probably a small bug?) where NtRemoveIoCompletionEx() will always return 1 in the output completion count in case of errors (while actually no completion is returned). It is not the case anymore at least on Win11 and the same test_set_io_completion() has the same below where I didn't touch anything. I guess in this place I will just remove this check, it is irrelevant here.
1. https://learn.microsoft.com/en-us/windows/win32/fileio/i-o-completion-ports
-- 
https://gitlab.winehq.org/wine/wine/-/merge_requests/6630#note_85824

2025

2024

2023

2022

Re: [PATCH 0/6] MR6630: ntdll, server: Fix various completion port scheduling issues.