http://bugs.winehq.org/show_bug.cgi?id=23802
--- Comment #6 from Stefan Dösinger stefandoesinger@gmx.at 2010-08-09 16:13:19 --- My performance debugging results were not as fruitful as I originally hoped. The patches I attached improve the performance with wined3d on Windows a little bit(22 -> 27 fps), but it is still a far cry from the 53 fps native d3d achieves.
The first one avoids a software fallback, it is pretty streightforward. However, the blit code is still in pretty bad shape, so I don't want to send it to wine-patches right away. This change may uncover some other problems, e.g. with backbuffer ORM.
The second one is needed because NFS:Shift uses more than 64 different framebuffer configurations in the main menu. The reconfiguration of framebuffers causes performance issues in the GL driver. This is a crude workaround only. For a proper fix we'd want some fbo cache limitation logic that isn't based on a fixed value. Also with growing FBO cache sizes we'll need different approach to organizing the cache, with the current design the cost for finding a specific FBO grows quickly with increased sizes. (NFS:Shift apparently generates mipmaps with FBO_blit, so it blits bitween many different surfaces. There may be a capability flag issue. D3D supports mipmap autogeneration that would probably be faster. Wined3d supports it too, so if the game knows about it it should be able to use it)
As I mentioned, on Windows this improves the performance by about 20%. On Linux, the non-d3d issues drain all the oxygen set free by these improvements, so the performance was improved from ~16fps to ~17fps. However, the 16 fps were better than the 6 fps I got when I ran the app the last time. Switching to Linux 2.6.34 may have improved things, I can investigate this closer if needed(about 1/3rd of the CPU time is spent in the Linux kernel).
The remaining issues in the main menu are in 3 areas: *) Blits are very expensive. There isn't a specific blit that is expensive. Some blits are integer RGBA blits, some are float blits, some scale up, some scale down. Filtering isn't the issue *) Activating the FBOs is expensive, about 10% of total CPU time are spent in glBindFramebuffer(ARB or EXT). I did not find any obvious workaround here. *) sRGB write correction. This is a GPU limitation, and it won't bother us until we remove the CPU limitations above.