On 28 February 2012 13:46, Stefan Dösinger stefan@codeweavers.com wrote:
buffer_direct_upload is working correctly and the gl buffer is mapped with GL_MAP_UNSYNCHRONIZED_BIT or GL_MAP_INVALIDATE_BUFFER_BIT, depending on the lock flags the buffer used. The performance hit comes from the memcpy, not the buffer map. This is particularly bad in 30019 because d3d7 forces us to memcpy the entire buffer. I added some dummy writes to the start and the end of the buffer instead of the memcpy to make sure the driver is forced to upload data.
Ok, but we could just be more clever about doing uploads and only upload what's actually referenced by draws. We already have most of the code for tracking uploaded ranges. Similarly, are we really sure that maintaining a sysmem copy of the entire buffer is really better than mapping the VBO for drawStridedSlow() in the first place? At some point the data needs to be sent to the GPU, and I think we're in a better position to decide what to upload and when than the driver.