Changes in version 2 of this patchset:
*) Rename resource_acquire to resource_reference. *) Simplify the wrap-around logic (thanks Jan!).
Some updated benchmarking information: Keeping a separate counter for fencing instead of re-using head and tail has a measurable performance impact in my draw overhead microbenchmark.
In World of Tanks the full 32 bit head/tail numbers wrap around within 3-4 minutes rather than a few hours as I concluded earlier from my microbenchmarks. In a way this is welcome - the wrap-around logic is actually used rather than untested dead code. It might make the phantom waits described in patch 2 a bit more likely though. Should this become an issue I believe we can change head and tail to SIZE_T or ULONGLONG. I could not measure any performance impact of a 64 bit counter vs a 32 bit counter. I tested it in 32 bit client on a 64 bit CPU. I don't have a multicore 32 bit CPU available for testing in a pure 32 bit setup.
This is the patchset described in https://www.winehq.org/pipermail/wine-devel/2022-January/204020.html . It simplifies and speeds up d3d resource tracking in a few ways:
*) Completely remove any burden on the CS thread. *) Replace interlocked ops on the client thread with a plain assignment. *) Piggy-pack onto the queue's head and tail counters, which we already increment with interlocked ops.
I tested the impact with a microbenchmark: https://github.com/stefand/perftest/blob/main/resource_tracking_d3d11/resour...
Depending on the CPU it doubles or tripples draw speed in that microbenchmark. In real games the effect is much less pronounced, but I do see about a 2% gain in World of Tanks. I also see a gain in Rocket League, but only if I hack away other known issues with Rocket League (UpdateSubResource in particular).
I have further improvements to resource tracking in my mind that can be done on top of these patches: *) Separate read and write access times. *) Remove draw and compute tracking for d3d10+ clients and only track staging resources.
Matteo had some ideas to make the queue multi-writer thread safe to further reduce the use of wined3d_cs. This patchset makes this a bit more complicated because the head value cannot be infered from the return value of require_space() and thus needs to be passed around separately to submit(). This can be done either with thread local storage or via a separate parameter to require_space() and submit().
Stefan Dösinger (5): wined3d: Use extra bits in the queue head and tail counters. wined3d: Use the default queue index for resource fencing. wined3d: Remove the no-op wined3d_resource_release. wined3d: Remove the resource_acquire call in resource_cleanup. wined3d: Rename resource_acquire to resource_reference.
dlls/wined3d/cs.c | 276 +++++++++------------------------ dlls/wined3d/resource.c | 2 - dlls/wined3d/wined3d_private.h | 68 ++++++-- 3 files changed, 123 insertions(+), 223 deletions(-)