http://bugs.winehq.org/show_bug.cgi?id=11674
--- Comment #267 from Henri Verbeet hverbeet@gmail.com 2012-12-21 13:45:08 CST --- (In reply to comment #266)
Thanks for your interest; I'm sorry for the confusion, I didn't realize Stefan didn't keep you in the loop following our discussion on that topic. The point I was making is that BufferSubData inherently maps better to dynamic buffer workloads than MapBuffer-based updating in threaded usecases. Since BufferSubData requests can be immediately queued in-band in both commands streams, overhead is kept at a minimum and maximal GPU throughput can be achieved.
It doesn't strike me as something that's really inherent in the API, just the specific implementation NVIDIA has. You'd have a point for D3D "static" buffers that would require a synchronized map, but then those aren't really supposed to be modified after they're initially created. (And implicitly that means they're only supposed to be mapped before they're used in any draws, so you should only ever potentially have to wait for the upload to finish before starting a draw, once.) D3D "dynamic" buffers end up using either GL_MAP_UNSYNCHRONIZED_BIT or GL_MAP_INVALIDATE_BUFFER_BIT on glMapBufferRange(), and shouldn't require synchronization with the command stream at all; it's the application's responsibility to ensure all accesses are safe.
Stefan pointed out that memory usage was a big concern for Wine, and
The issue I suspect he's referring to is more about address space than actual memory usage. Regardless, although that consideration does play a role, the main consideration for me is that changing the code here would mean either potentially making things worse for other drivers in order to make things better for NVIDIA, or adding a special codepath for NVIDIA somewhere that's going to bitrot or increase maintenance costs. While the latter at least isn't entirely out of the question, it would require some solid justification.