http://bugs.winehq.org/show_bug.cgi?id=11674
--- Comment #268 from Pierre-Loup Griffais bugs.winehq.org@plagman.net 2012-12-21 14:38:00 CST --- (In reply to comment #267)
D3D "dynamic" buffers end up using either GL_MAP_UNSYNCHRONIZED_BIT or GL_MAP_INVALIDATE_BUFFER_BIT on glMapBufferRange(), and shouldn't require synchronization with the command stream at all; it's the application's responsibility to ensure all accesses are safe.
It's still a synchronous API that needs to return a pointer immediately, though, so it stalls the CPU command stream between the main and worker thread in the driver. Which, in turn, can introduce GPU starvation.
The issue I suspect he's referring to is more about address space than actual memory usage.
Yes, and that makes keeping a full mapping of the buffer undesirable compared to pushing all sub-updates in-band and freeing the corresponding chunks of memory as they go through.
Regardless, although that consideration does play a role, the main consideration for me is that changing the code here would mean either potentially making things worse for other drivers in order to make things better for NVIDIA, or adding a special codepath for NVIDIA somewhere that's going to bitrot or increase maintenance costs. While the latter at least isn't entirely out of the question, it would require some solid justification.
You'd think the performance gains I demonstrated (and that Stefan observed in real-world usage) would be justification enough. Since this is the path that maps the best to D3D dynamic buffers (regardless of what current implementations do with it), I believe it makes sense at least have it available somewhere, if not by default yet, and then attempt to improve drivers if performance problems are uncovered (if any). It certainly seems like a better approach than trying to fix drivers to optimize paths for usage patterns that don't make sense at the cost of introducing overhead for usage patterns that do.
Thanks! - Pierre-Loup