On 3 June 2014 09:41, Stefan Dösinger stefan@codeweavers.com wrote:
As the previous tests show, we can't do anything with this flag.
Sort of. It may make sense to set WINED3D_BUFFER_DOUBLEBUFFER if WINED3DUSAGE_WRITEONLY isn't set.
Am 04.06.2014 um 16:02 schrieb Henri Verbeet hverbeet@gmail.com:
On 3 June 2014 09:41, Stefan Dösinger stefan@codeweavers.com wrote:
As the previous tests show, we can't do anything with this flag.
Sort of. It may make sense to set WINED3D_BUFFER_DOUBLEBUFFER if WINED3DUSAGE_WRITEONLY isn't set.
Why would we want to do that? The only thing I can think of is if there's a GL driver where reading back the buffer object has the same performance characteristics as reading back a WRITEONLY resource on Windows. So far I haven't seen an application / driver combination with such an issue, but I didn't really look for one either.
On 4 June 2014 18:35, Stefan Dösinger stefandoesinger@gmail.com wrote:
Am 04.06.2014 um 16:02 schrieb Henri Verbeet hverbeet@gmail.com:
Sort of. It may make sense to set WINED3D_BUFFER_DOUBLEBUFFER if WINED3DUSAGE_WRITEONLY isn't set.
Why would we want to do that? The only thing I can think of is if there's a GL driver where reading back the buffer object has the same performance characteristics as reading back a WRITEONLY resource on Windows. So far I haven't seen an application / driver combination with such an issue, but I didn't really look for one either.
I could imagine buffers being moved from VRAM to GART when they're mapped, which would then make subsequent draws potentially slower. Dynamic buffers are more or less expected to be in GART, but we want static buffers to be in VRAM as much as possible. I could also perhaps imagine the driver keeping a copy in CPU memory instead, but that would then use up address space.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 2014-06-04 19:21, schrieb Henri Verbeet:
I could imagine buffers being moved from VRAM to GART when they're mapped, which would then make subsequent draws potentially slower. Dynamic buffers are more or less expected to be in GART, but we want static buffers to be in VRAM as much as possible. I could also perhaps imagine the driver keeping a copy in CPU memory instead, but that would then use up address space.
I guess those things are possible, but at the moment hypothetical. I don't think we should keep printing the FIXME because of them. If we print anything it would be better to write a FIXME if DYNAMIC is set, but WRITEONLY isn't.
(Yes, I am aware of some buffer handling performance problems on the Nvidia driver, but as far as I understand them they are not about VRAM vs GART placement, but about unneeded synchronization.)
On 5 June 2014 12:32, Stefan Dösinger stefandoesinger@gmail.com wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 2014-06-04 19:21, schrieb Henri Verbeet:
I could imagine buffers being moved from VRAM to GART when they're mapped, which would then make subsequent draws potentially slower. Dynamic buffers are more or less expected to be in GART, but we want static buffers to be in VRAM as much as possible. I could also perhaps imagine the driver keeping a copy in CPU memory instead, but that would then use up address space.
I guess those things are possible, but at the moment hypothetical. I don't think we should keep printing the FIXME because of them. If we print anything it would be better to write a FIXME if DYNAMIC is set, but WRITEONLY isn't.
I suppose you could replace the current FIXME with some kind of d3d_perf WARN in buffer_init(). I'd still prefer benchmarking about the impact of the flag in various scenarios be done.
Am 05.06.2014 um 12:40 schrieb Henri Verbeet hverbeet@gmail.com:
I suppose you could replace the current FIXME with some kind of d3d_perf WARN in buffer_init(). I'd still prefer benchmarking about the impact of the flag in various scenarios be done.
I have done some basic benchmarking of the WRITEONLY flag in combination with DYNAMIC on Windows. The short summary: D3DUSAGE_WRITEONLY has no impact on Nvidia. On AMD GPUs not setting D3DUSAGE_WRITEONLY makes the common CPU->GPU streaming use case slower. If the application maps the buffer with D3DLOCK_READONLY or even reads back its contents, not setting D3DUSAGE_WRITEONLY improves performance considerably.
All tests were run on Windows 7. I have not tested this on Intel.
This is the raw data. The values are frames per seconds. The GPU is mostly idle in my test application. "draw" means the common writeonly use case of buffers where data is written with DISCARD or NOOVERWRITE maps. "read" writes data the usual way, draws, then performs a readonly map and copies the data from the buffer into a separate block of memory. "lock only" behaves like read, but does not perform the memcpy.
dynamic dynamic | writeonly Geforce 650m draw 925 980 read 1.4 1.4 lock only 390 385
X1600 draw 167 220 read 45 1.69 lock only 159 11.24
hd5770 draw 157 345 read 40 0.39 lock only 145 30
On 4 July 2014 21:27, Stefan Dösinger stefandoesinger@gmail.com wrote:
I have done some basic benchmarking of the WRITEONLY flag in combination with DYNAMIC on Windows. The short summary: D3DUSAGE_WRITEONLY has no impact on Nvidia. On AMD GPUs not setting D3DUSAGE_WRITEONLY makes the common CPU->GPU streaming use case slower. If the application maps the buffer with D3DLOCK_READONLY or even reads back its contents, not setting D3DUSAGE_WRITEONLY improves performance considerably.
All tests were run on Windows 7. I have not tested this on Intel.
This is the raw data. The values are frames per seconds. The GPU is mostly idle in my test application. "draw" means the common writeonly use case of buffers where data is written with DISCARD or NOOVERWRITE maps. "read" writes data the usual way, draws, then performs a readonly map and copies the data from the buffer into a separate block of memory. "lock only" behaves like read, but does not perform the memcpy.
dynamic dynamic | writeonly
Geforce 650m draw 925 980 read 1.4 1.4 lock only 390 385
X1600 draw 167 220 read 45 1.69 lock only 159 11.24
hd5770 draw 157 345 read 40 0.39 lock only 145 30
Actually, another guess would be that the driver will hand you a write-combined mapping if D3DUSAGE_WRITEONLY is set. (And just always on NVIDIA.) In theory that would be visible through VirtualQuery(). Of course OpenGL doesn't really allow you to control that, other than implicitly through GL_MAP_WRITE_BIT / GL_MAP_READ_BIT and usage hints.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 2014-07-07 14:09, schrieb Henri Verbeet:
Actually, another guess would be that the driver will hand you a write-combined mapping if D3DUSAGE_WRITEONLY is set. (And just always on NVIDIA.) In theory that would be visible through VirtualQuery().
I did a quick test, and this is exactly what happens.