Hi,
Before the holidays I spent some time optimizing the cs resource fencing code. The current state is attached for review. I'll send it for upstreaming after the code freeze.
The basic idea is to use the default queue head and tail for fencing. This completely removes any work on the command stream thread side, and the main thread work goes from an interlocked op to a simple assignment. Together with the technically unrelated patch 4 it improves a microbenchmark I wrote for this (https://github.com/stefand/perftest/tree/main/resource_tracking_d3d11) from ~200 fps to ~700 fps on my Ryzen CPU. Other CPUs have lower gains, but still more than double the framerate. It also produces a measurable improvement in Rocket League once other known CS issues are hacked away.
Items for discussion:
1) I am not entirely sure I do the ULONG / LONG handling correctly. I guess we could get away with just keeping everything as signed LONGs, but technically signed int overflow is undefined behavior. Interlocked ops accept LONG * though...
2) resource_acquire could be renamed to something else
3) Separate read and write timestamps. This should be easy to add on top of the current code.
4) Traversing resource->device->cs->queue in wined3d_resource_acquire is ugly. I'm contemplating passing const struct wined3d_cs or the timestamp to it explicitly.
5) We still iterate over a huge number of resources. Does anyone have ideas how to cut this down?
Happy new Year, Stefan
FWIW, patch 2/5 is broken where it is; wined3d_device_context_upload_bo() does acquire the resource. Ultimately that just means that the patch needs to be moved later in the series, though.
Am Dienstag, 4. Jänner 2022, 18:52:11 CET schrieb Zebediah Figura (she/her):
FWIW, patch 2/5 is broken where it is; wined3d_device_context_upload_bo() does acquire the resource. Ultimately that just means that the patch needs to be moved later in the series, though.
Oh indeed, I missed that WINED3D_CS_OP_UPDATE_SUB_RESOURCE is used from two places. Still, doesn't make much of a difference in the whole scheme.
I can't move it behind patch 3 as-is because as of patch 3 we can't fence map ops any longer, so we'd set a nonsense access time. Merging it into patch 3 would be an option, but I am sure we can do this more elegantly.