Yes, mostly what Matteo said.
On Fri, 28 May 2021 at 15:39, Matteo Bruni matteo.mystral@gmail.com wrote:
So this is, AFAIR, more or less what the staging patchset did. I don't think that we want to put the data in the command stream though.
It may not be too bad for deferred contexts, but for immediate contexts probably not, no.
Always copying the data from the user-provided source to some other storage is definitely the right choice: we don't want to block the application (more or less directly) to make sure that the data stays around until we need it. That's bad for performance for immediate contexts and just impossible for deferred, so not much to argue there. At the same time we also want to minimize the amount of copies. Ideally, for a GPU resource, here we'd upload the data right to the GPU (e.g. inside a persistently mapped buffer), ready to be used whenever it's time. For a CPU resource instead we'd copy the data into its final place in sysmem (what will eventually become its "heap_memory"). We don't quite have the infrastructure for it at the moment but that would be the goal IMO. What you do in patch 2/2 is more along the lines of this idea. The sysmem allocation mechanism works okay for the time being and could be extended to handle other types of memory, e.g. it could eventually take charge of the vulkan allocator, making it generic and usable from GL too. I don't think it has to be managed by each deferred context (i.e. it should probably be per-device), although the current scheme has the advantage of not having to care about concurrency.
So, at least in my view, I'd bring the allocator bits from patch 2/2 into patch 1/2 and use that for update_sub_resource, both the deferred and the immediate variants (the latter could be its own patch). You'd only need the DISCARD portion for update_sub_resource, so the rest can (and probably should) stay in the map() patch.
My (very rough) proposal for this would be to replace the "data" field in the wined3d_cs_update_sub_resource structure with a wined3d_bo_address, and then introduce something like the following:
static bool wined3d_device_context_get_upload_bo(struct wined3d_device_context *context, struct wined3d_resource *resource, unsigned int sub_resource_idx, const struct wined3d_box *box, uint32_t flags, struct wined3d_bo_address *address, void **map_ptr) { ... }
For deferred contexts you should then be able to do something (conceptually) very similar to patch 2/2, and in the end we may not even need map/unmap/update_sub_resource in the wined3d_device_context_ops structure.