Re: [PATCH v2 0/4] MR9510: win32u: Implement timeline semaphore rewinding.

18 Nov 2025


      ...
Pretty much the same kind of thing Proton does, this is only for non-shared semaphores for now.
Well, Proton doesn't do that for timeline semaphores, it is only done for d3d12 shared fences. I think the main issue here is that doing this for every timeline semaphore is a no go performance wise. I know timeline semaphores may be showing similar semantics on Windows at least on NVidia (but is that the same on AMD, Intel, not sure?) but nothing is known to depend on such behaviour.
The (severe) problem with performance comes from the fact such wrapping routes GPU side waits through CPU side scheduling. That really unfortunate and stalls the GPU pipelines. That is especially unfortunate on NVidia where for some reason CPU waits on timeline semaphores have significant delay waking up. I think we cannot afford that in general Vulkan translation (even if there will be some native Vulkan app depending on this out of spec behaviour). Proton only gets away with that with d3d12 shared fences because those are not used extensively, unlike Vulkan timeline semaphores (both in Vulkan games and d3d translation layers). I think we should only be doing this (between the lines of this patchset) for backing d3d12 fences.
Then, a server call to signal shared fence value seems like entirely no go on its own. Maybe we can do that as a special case for cross process fence (when a fence is known to be shared between different processes; I honestly doubt we encountered such a case in practice so far but not sure). But most of the time at least those are used within the same process between different 3d devices, it is important to avoid any server calls for those (besides creation / destruction).
Then, if we are talking about d3d12 fences semantics, the implementation here doesn't seem to mind "pulse timeline" case. When d3d12 fence gets higher value all the queued waiters for lower values are supposed to be woken, similar to SetEvent() WRT waiters in queue. While the value can be reset to lower value (either explicitly or through queued fence signal which happened to end later than some other set to some higher value; the cases is similar to PulseEvent or ResetEvent followed by SetEvent). This is inherently hard to do without having real wait queue in place (while, actually, we do have one?). Proton currently does it by maintaining fence value history with some fixed depth. Maybe it can be done without the history by ordering signals + waits appropriately within the scheduling pass in timeline_thread() and throw the multi-process shared fence "pulse" case under the bus.
Then, this is probably more subtle (while issues from the similar problem in vkd3d-proton non-shared d3d12 fences implementation were encountered in practice) it seems like both Proton implementation and this one may allow for out of order signaling of events. If there are dependent submits signaling (different) fences within a signle timeline_thread() scheduling pass, I think now the order of signal is arbitrary while it should be making sure that dependent signals are signaled not before those they are depend on. That is not something which so far came up in practice with shared fences (while bugs in such a place depending on subtle timing are very hard to deal with and it is better to have that right). But if we imagine we do timeline semaphores this way that will be likely breaking things at once.
-- 
https://gitlab.winehq.org/wine/wine/-/merge_requests/9510#note_122818

2025

2024

2023

2022

Re: [PATCH v2 0/4] MR9510: win32u: Implement timeline semaphore rewinding.