On 28 February 2012 12:19, Stefan Dösinger stefan@codeweavers.com wrote:
This is for bug 30019.
They are a lot slower than using no VBO at all in at least some, if not all games. The reason for the slowdown is the memcpy call, not the buffer map.
I don't find this all that convincing either. My impression is that the reason we have these kinds of problems is mostly that the buffer conversion code is broken. I don't think there are legitimate cases where not creating a VBO should improve performance.
Am Dienstag, 28. Februar 2012, 13:05:36 schrieb Henri Verbeet:
I don't find this all that convincing either. My impression is that the reason we have these kinds of problems is mostly that the buffer conversion code is broken. I don't think there are legitimate cases where not creating a VBO should improve performance.
There's no conversion involved. 30019 uses double-buffered buffers because it falls back to drawStridedSlow for material source tracking. 29897 and 29079 also have double-buffered buffers without conversion too, I assume because of instanced draws.
buffer_direct_upload is working correctly and the gl buffer is mapped with GL_MAP_UNSYNCHRONIZED_BIT or GL_MAP_INVALIDATE_BUFFER_BIT, depending on the lock flags the buffer used. The performance hit comes from the memcpy, not the buffer map. This is particularly bad in 30019 because d3d7 forces us to memcpy the entire buffer. I added some dummy writes to the start and the end of the buffer instead of the memcpy to make sure the driver is forced to upload data.
It's not a minor performance difference. In 30019 it's 65 fps vs 0.67 fps. The d3d9 bugs are less severe, but in Serious Sam HD the framerate is still cut in half.
On 28 February 2012 13:46, Stefan Dösinger stefan@codeweavers.com wrote:
buffer_direct_upload is working correctly and the gl buffer is mapped with GL_MAP_UNSYNCHRONIZED_BIT or GL_MAP_INVALIDATE_BUFFER_BIT, depending on the lock flags the buffer used. The performance hit comes from the memcpy, not the buffer map. This is particularly bad in 30019 because d3d7 forces us to memcpy the entire buffer. I added some dummy writes to the start and the end of the buffer instead of the memcpy to make sure the driver is forced to upload data.
Ok, but we could just be more clever about doing uploads and only upload what's actually referenced by draws. We already have most of the code for tracking uploaded ranges. Similarly, are we really sure that maintaining a sysmem copy of the entire buffer is really better than mapping the VBO for drawStridedSlow() in the first place? At some point the data needs to be sent to the GPU, and I think we're in a better position to decide what to upload and when than the driver.
Am Dienstag, 28. Februar 2012, 14:50:28 schrieb Henri Verbeet:
Ok, but we could just be more clever about doing uploads and only upload what's actually referenced by draws. We already have most of the code for tracking uploaded ranges. Similarly, are we really sure that maintaining a sysmem copy of the entire buffer is really better than mapping the VBO for drawStridedSlow() in the first place? At some point the data needs to be sent to the GPU, and I think we're in a better position to decide what to upload and when than the driver.
Indexed draws will be tricky, it depends on how decent the app-provided index ranges are. It also means we'd have to track which parts of the VB are uploaded and which aren't.
Mapping the VBO in drawStridedSlow might be an option, although I guess that would force the driver to wait until previous draws are done.
All those options are out of reach for Wine 1.4, and beyond Wine 1.4 I'd prefer to fix the problems that force us into drawStridedSlow by implementing a vertex pipeline and proper instancing.