http://bugs.winehq.org/show_bug.cgi?id=11674
--- Comment #272 from Pierre-Loup Griffais bugs.winehq.org@plagman.net 2012-12-25 11:36:46 CST --- Please keep in mind all my input so far applies to my original point of leveraging the potential performance gains that the NVIDIA threaded optimizations can provide, not the general case, where I would expect both updating methods to be roughly equivalent (with the small tradeoff of MapBuffer requiring more address space but less copies).
Please see the "Threaded Optimizations" section in the link below for more context:
ftp://download.nvidia.com/XFree86/Linux-x86/313.09/README/openglenvvariables.html
The goal here is to have _minimal_ CPU overhead in the main thread, so the logic there doesn't have any knowledge of the GL state. This greatly improves performance in both CPU-bound and GPU-bound use-cases (since it reduces starvation problems and allows the driver to more easily perform optimizations at the command-stream level, rather than dealing with a single command at a time), at the expense of not interacting well with totally synchronous commands such as all Gets and MapBuffer. Semi-synchronous APIs that were designed with pipelining in mind such as queries are still a fast path.
Currently the only two threading modes exposed are "forced-off" (the default) and "forced-on", which the __GL_THREADED_OPTIMIZATIONS environment variable controls. In the future there will be an "auto" mode similar to Windows where the driver will know to fall out of threaded mode if it detects that the workload uses a large number of synchronous calls, to avoid impairing performance in these cases.
To summarize, I think that in the current state of things, MapBufferRange vs BufferSubData for the regular case are two fast, valid approaches with tradeoffs on each side. If address space is a concern, relying on buffer mappings might be problematic, however.
But to get the best throughput, I recommend enabling the NVIDIA threaded optimizations and using BufferSubData with the invalidation scheme I explained (since it gives the driver similar information to the what the MapBuffer path specifies) to unlock further performance gains and be faster than D3D at dynamic buffers.
I hope you're having a great holiday season; best wishes!