2015-02-11 21:53 GMT+01:00 Stefan Dösinger stefan@codeweavers.com:
This works on cards that don't implement ARB_depth_clamp like r500 cards.
Note that texturing is influenced by position.w, not position.z.
dlls/wined3d/directx.c | 7 ------- dlls/wined3d/state.c | 33 ++++++++++----------------------- dlls/wined3d/wined3d_gl.h | 2 -- 3 files changed, 10 insertions(+), 32 deletions(-)
It's probably hard to measure and not going to really matter in practice but toggling the depth clamping (where supported) might be slightly faster than updating the projection matrix. I'm not NACKing the patch though.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 2015-02-11 um 23:56 schrieb Matteo Bruni:
It's probably hard to measure and not going to really matter in practice but toggling the depth clamping (where supported) might be slightly faster than updating the projection matrix.
I'll try to patch my drawprim overhead tester to test this.
I don't expect it to be faster though. At very least it depends on the game's behavior. One of two switches that needs to be toggled to disable depth clipping is switching to POSITIONT vertices, in which case we update the projection matrix anyway.
Even if there's a minor performance advantage of depth clamping in cases where an application constantly uses POSITIONT and switches ZENABLE on an off I prefer to always use the projection matrix to have only one codepath that does this. If there's a huge difference we may think about two alternating codepaths.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 2015-02-12 um 08:45 schrieb Stefan Dösinger:
Am 2015-02-11 um 23:56 schrieb Matteo Bruni:
It's probably hard to measure and not going to really matter in practice but toggling the depth clamping (where supported) might be slightly faster than updating the projection matrix.
I'll try to patch my drawprim overhead tester to test this.
Fwiw, I cannot see any performance difference between ARB_depth_clamp and the projection matrix approach in my modified test program.
What I did: Set a POSITIONT vertex
while(running) { for(i = 0; i < 1000; i++) { set_rs(ZENABLE, TRUE); draw(); set_rs(ZENABLE, FALSE); draw(); set_rs(ZENABLE, TRUE); draw(); set_rs(ZENABLE, FALSE); draw(); set_rs(ZENABLE, TRUE); draw(); set_rs(ZENABLE, FALSE); draw(); } present(); }
I.e., I hit the worst case for the new approach. With ARB_depth_clamp this test program runs at 89.5 fps, with matrices at 89.2. There seems to be a general up and down of +/- 5 fps. Interestingly windows runs this test program at 58 fps.
I tested this on Nvidia. I can also test it on r600g if desired, but I don't really expect it to matter.