RFC: Rework of wined3d cs fencing

4 Jan 2022


      Hi,
Before the holidays I spent some time optimizing the cs resource fencing code. 
The current state is attached for review. I'll send it for upstreaming after 
the code freeze.
The basic idea is to use the default queue head and tail for fencing. This 
completely removes any work on the command stream thread side, and the main 
thread work goes from an interlocked op to a simple assignment. Together with 
the technically unrelated patch 4 it improves a microbenchmark I wrote for 
this (https://github.com/stefand/perftest/tree/main/resource_tracking_d3d11) 
from ~200 fps to ~700 fps on my Ryzen CPU. Other CPUs have lower gains, but 
still more than double the framerate. It also produces a measurable 
improvement in Rocket League once other known CS issues are hacked away.
Items for discussion:
1) I am not entirely sure I do the ULONG / LONG handling correctly. I guess we 
could get away with just keeping everything as signed LONGs, but technically 
signed int overflow is undefined behavior. Interlocked ops accept LONG * 
though...
2) resource_acquire could be renamed to something else
3) Separate read and write timestamps. This should be easy to add on top of 
the current code.
4) Traversing resource->device->cs->queue in wined3d_resource_acquire is ugly. 
I'm contemplating passing const struct wined3d_cs or the timestamp to it 
explicitly.
5) We still iterate over a huge number of resources. Does anyone have ideas 
how to cut this down?
Happy new Year,
Stefan

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

RFC: Rework of wined3d cs fencing