Replying to both mails at once, sort of...
On 5/31/21 4:53 AM, Henri Verbeet wrote:
Yes, mostly what Matteo said.
On Fri, 28 May 2021 at 15:39, Matteo Bruni matteo.mystral@gmail.com wrote:
So this is, AFAIR, more or less what the staging patchset did. I don't think that we want to put the data in the command stream though.
It may not be too bad for deferred contexts, but for immediate contexts probably not, no.
Always copying the data from the user-provided source to some other storage is definitely the right choice: we don't want to block the application (more or less directly) to make sure that the data stays around until we need it. That's bad for performance for immediate contexts and just impossible for deferred, so not much to argue there. At the same time we also want to minimize the amount of copies. Ideally, for a GPU resource, here we'd upload the data right to the GPU (e.g. inside a persistently mapped buffer), ready to be used whenever it's time.
To clarify, when talking about uploading directly to the GPU, are you just referring to map() here, or update_sub_resource() as well?
Does it even make sense to use a persistently mapped GPU buffer for a resource mapped/updated by a deferred context? Obligatory disclaimer that I don't know anything about GPU-side performance, but obviously you can't reuse an earlier mapping for UpdateSubResource() or for DISCARD maps, and I'm failing to understand why an application would ever use NOOVERWRITE on a deferred context.
For a CPU resource instead we'd copy the data into its final place in sysmem (what will eventually become its "heap_memory"). We don't quite have the infrastructure for it at the moment but that would be the goal IMO. What you do in patch 2/2 is more along the lines of this idea. The sysmem allocation mechanism works okay for the time being and could be extended to handle other types of memory, e.g. it could eventually take charge of the vulkan allocator, making it generic and usable from GL too. I don't think it has to be managed by each deferred context (i.e. it should probably be per-device), although the current scheme has the advantage of not having to care about concurrency.
In a sense there is no "current scheme", i.e. deferred contexts are tracking which memory belongs to which resource because that *has* to be done per-context, but nobody's managing custom heaps. Unless I misunderstand what's meant here?
So, at least in my view, I'd bring the allocator bits from patch 2/2 into patch 1/2 and use that for update_sub_resource, both the deferred and the immediate variants (the latter could be its own patch). You'd only need the DISCARD portion for update_sub_resource, so the rest can (and probably should) stay in the map() patch.
My (very rough) proposal for this would be to replace the "data" field in the wined3d_cs_update_sub_resource structure with a wined3d_bo_address, and then introduce something like the following:
static bool wined3d_device_context_get_upload_bo(struct
wined3d_device_context *context, struct wined3d_resource *resource, unsigned int sub_resource_idx, const struct wined3d_box *box, uint32_t flags, struct wined3d_bo_address *address, void **map_ptr) { ... }
This function strikes me as more than a little confusing, party because it seems to be returning two partially overlapping things, with mostly different usages...
i.e. I'd presume the idea is for wined3d_device_context_update_sub_resource() to retrieve this and then call something like wined3d_context_copy_bo_address() to copy from sysmem, which makes sense, but then what are we doing with map_ptr?
Or, if we use this in wined3d_device_context_map(), couldn't we just call wined3d_context_map_bo_address() on the returned wined3d_bo_address?
For deferred contexts you should then be able to do something (conceptually) very similar to patch 2/2, and in the end we may not even need map/unmap/update_sub_resource in the wined3d_device_context_ops structure.
We'd still presumably need an unmap, though, unless the idea above is to *always* allocate new (GPU or CPU) memory, which seems like a much bigger step.
Unless I'm misreading things, this also seems to imply that *_resource_sub_resource_map() shouldn't exist anymore, right?