Hi, This is intended mostly for the other d3d developers, but since we have quite a number of them now so individual CCs are a lot of work :-)
I attached the patches I currently have in my tree to give an update on what I've been working on recently. The main aim of those patches is to reduce draw overhead a bit, thus improving game performance. The patches need some cleanup, but for that I first need a patch Matteo is working on.
Feedback is welcome. I'm also interested in test results, e.g. if the changes break a game, or the performance impact. If those patches cause a 5% performance increase I am happy.
Patches 1-3: Mostly unrelated. I haven't sent them yet because patch 3 breaks Unigine Heaven, and patches 1 and 2 make little sense without 3.
Patch 4: This removes a hack for a driver bug workaround. I have to do more testing on my old machines to find out if the bug is really fixed in newer nvidia drivers.
Patches 5, 6: They keep track of changes to the framebuffer setup so we don't have to run through the code that figures out which FBO to bind every draw. Patch 5 gets rid of the ordering assumption. Patch 6 applies the FBO only when needed.
They aren't ready yet. In patch 6 the FBO may have to be reapplied when the pixelshader changes. To implement that I need some draw buffer tracking infrastructure Matteo is working on. Also clears can be integrated. fbo- clear.diff is a half-baked attempt to do this. I dropped it when I realized I was duplicating Matteos work. After that I have to double-check that I took care of all situations where the FBO may have to be updated.
Furthermore, Matteo says that not calling context_apply_draw_buffers every time framebuffer() is run is a noticeable performance improvement too. Matteo, did you test this with just patch 0005, or both 0005 and 0006?
Patch 0007: Sampler map optimization, it has a lengthy description in the patch file
Patch 0008: A tiny fix, it results in a pretty small improvement on OSX. On Linux+Nvidia it is not noticeable.
Patch 0009: At first I tried to skip the render target dirtification entirely via a flag in the d3ddevice, but it was pretty tricky and ugly. Just making it cheaper gets us ~2/3rd of the way too. (Draw overhead tester performance without this: 259 fps. Complete disabling of the dirtification calls via a hack: 275. With this patch: 269)
0010: An unrelated cleanup
Patches 11, 12: Preparation for including clears in the fbo dirtification patches. See fbo-clear.diff.
More work on performance is obviously required, for example
*) Separate vertex declaration, vertex shader and pixel shader states *) Speed up sampler preloading. This will be easier once we have a tree-like state structure. *) Write more tests for other common operations, like clears, blits, shader changes, texture changes, vertex buffer changes, dynamic resource loading *) Test our shader's GPU-side execution performance *) See if we can do something about locking *) Isolate bottlenecks in the GPU drivers and get them fixed.