On Sat Jan 25 04:40:34 2025 +0000, Paul Gofman wrote:
Your patch doesn't change whether NtYieldExecution is called or not, and thus has no chance to affect any host "priority boost". I actually doubt that it works this way now, giving any additional "boost" after we already waited for server call, but this is a different story unrelated to this patch. And especially that NtYieldExecution worth calling after non-zero timeout; if the thread slept for some time it did perform the yield, no way around. in a much more sure way than what NtYieldExecution does which is de-facto no-op most often. But even with zero timeout server call did a blocking wait. _NO_YIELD_PERFORMED from NtYieldExecution() only reflects reality when there were no other blocking native system calls on the way. Basing this status on the sole return of NtYieldExecution in server_wait() on that path is plainly wrong, it doesn't take into account yield performed during server call. So returning _NO_YIELD_PERFORMED based on that addition NtYieldExecution may only confuse the app, hinting it to wait additionally. As a bonus, ignoring that from server_wait() reduces the functional part of the patch to 2-3 lines.
That's not quite what I meant, I know I'm not changing that behavior. I just meant that the alertable waits that **do** hit the yield (timed out) get an additional **penalty**, which consequently results in the waits that **don't** hit the yield having a slight advantage ("boost") over those ones. It's rudimentary and perhaps symbolic, given the weakness of `sched_yield`, but I think it makes sense when you compare it to Windows' behavior.
Before I made the Wine tests, I made a standalone test with 32000 alertable zero-timeout NtDelayExecution calls and ran them:
**Windows 10 22H2** VM (qemu):
Default timer resolution: ``` Total execution time: 156 ms
Yield Return Status Distribution: STATUS_SUCCESS: 19170 (59.9%) STATUS_TIMEOUT: 0 (0.0%) STATUS_NO_YIELD_PERFORMED: 12802 (40.0%) STATUS_USER_APC: 28 (0.1%) Other statuses: 0 (0.0%) ``` 1ms timer resolution: ``` Total execution time: 47 ms
Yield Return Status Distribution: STATUS_SUCCESS: 647 (2.0%) STATUS_TIMEOUT: 0 (0.0%) STATUS_NO_YIELD_PERFORMED: 31328 (97.9%) STATUS_USER_APC: 25 (0.1%) Other statuses: 0 (0.0%) ``` It seems that newer Windows NtDelayExecution return value heavily depends on the timer resolution, but with the default of 16ms, it's almost an even balance between the two on an idle system. Curiously, Windows 7 matches the 16ms resolution results of Windows 10, regardless of timer resolution.
Now, here's **Wine after this patch** (only 1ms matters on Wine, obviously): ``` Total execution time: 31 ms
Yield Return Status Distribution: STATUS_SUCCESS: 181 (0.6%) STATUS_TIMEOUT: 0 (0.0%) STATUS_NO_YIELD_PERFORMED: 31789 (99.3%) STATUS_USER_APC: 30 (0.1%) Other statuses: 0 (0.0%) ``` The ratio between SUCCESS/NO_YIELD_PERFORMED is eerily similar to Windows 10 with a timer interval of 1ms. That's why I think it's correct the way it is.