https://bugs.winehq.org/show_bug.cgi?id=38166
--- Comment #26 from Paul Gofman gofmanp@gmail.com --- For the record, what is going on here. The application creates a big vertex buffer, ~25MB in size, and constantly updates it by locking the whole buffer. Most of the time the buffer is locked with _NOOVERWRITE and it goes ok. But sometimes application starts locking the buffer without any flags (still requesting the whole buffer lock). Such a lock is interleaved with multiple _NOOVERWRITE locks of the same buffer. The application is probably shy of that and does not ever do such a lock often, maybe once per frame approximately. While this is probably not supposed to work very efficient in any d3d implementation, for some reason such infrequent locks (which result in getting the buffer to system memory, locking there and offloading again to GPU upon subsequent use) result in disastrous slowdown with Nvidia proprietary driver. Loading the buffer to sysmem and back takes a good fraction of a second, and once that mode starts, is often accompanied by the following messages:
0031:err:d3d:wined3d_debug_callback 0x1ba470: "GL_OUT_OF_MEMORY error generated. Failed to allocate CPU address space mapping for texture (consider building 64-bit app).".
This is reported to happen with Nvidia proprietary driver only. I tested with Nouveau driver also on the same GPU and the problem does not appear with it.
So it looks like some memory managing specifics / issue with Nvidia driver. I am not sure my patch attached to this bug is really an improvement for a general case, this logic probably needs some neater heuristics to introduce a speed up somewhere else besides this game.