No, we shouldn't get there, ideally update_sub_resource should avoid syncing there. But for one, that's an implementation detail of USR, and it is also something that USR may change.
E.g. if the buffer is idle, USR may one day decide to copy the data directly into the destination buffer instead of a HeapAlloc'ed temp buffer. Avoiding two up-to-4k memcpys should make up for assigning values in a handful of buffers, but I haven't benchmarked this.
The reason why I ran into this is that I semi-accidentally eliminated the sync "acceleration" in map_upload_bo, thinking it was intended as a broken fallback for async maps without ARB_buffer_storage. That surprisingly broke rendering because the USR wait_resource codepath didn't work...
Semi-related to this patch, the current heap_alloc code in map_upload_bo has a few flaws and I think it should go away:
1. It HeapAlloc/HeapFree's on every invocation, and thus on every SetShaderConstant. That's worse than setting values on a few buffers. 2. It returns non-aligned buffers in case of maps - but that is easy to fix 3. It throws away existing data without the app setting DISCARD, which is a fatal flaw for maps
If we want to avoid buffer fencing in draws that should be more thorough than relying on implementation details. E.g. we should be able to get rid of most buffer fences for at least d3d10 in draws because we either draw from GPU-only resources (can't be mapped) or dynamic resources (app is responsible). Sync maps are only possible on staging resources, so we only have to fence things like CopyResource. If the d3d7-9 managed resource code is careful I think we can extend the same logic to earlier API versions. But that's beyond what I want to look into right now.