As usual my reply ended up being unwieldy and confusing. Hopefully it's still helpful...
On Mon, May 31, 2021 at 11:45 PM Zebediah Figura (she/her) zfigura@codeweavers.com wrote:
On 5/31/21 4:53 AM, Henri Verbeet wrote:
Yes, mostly what Matteo said.
On Fri, 28 May 2021 at 15:39, Matteo Bruni matteo.mystral@gmail.com wrote:
So this is, AFAIR, more or less what the staging patchset did. I don't think that we want to put the data in the command stream though.
It may not be too bad for deferred contexts, but for immediate contexts probably not, no.
Always copying the data from the user-provided source to some other storage is definitely the right choice: we don't want to block the application (more or less directly) to make sure that the data stays around until we need it. That's bad for performance for immediate contexts and just impossible for deferred, so not much to argue there. At the same time we also want to minimize the amount of copies. Ideally, for a GPU resource, here we'd upload the data right to the GPU (e.g. inside a persistently mapped buffer), ready to be used whenever it's time.
To clarify, when talking about uploading directly to the GPU, are you just referring to map() here, or update_sub_resource() as well?
Potentially both cases. A DISCARD map is virtually indistinguishable from update_sub_resource().
Does it even make sense to use a persistently mapped GPU buffer for a resource mapped/updated by a deferred context? Obligatory disclaimer that I don't know anything about GPU-side performance, but obviously you can't reuse an earlier mapping for UpdateSubResource() or for DISCARD maps, and I'm failing to understand why an application would ever use NOOVERWRITE on a deferred context.
I don't have all the details of deferred contexts down, but I'm pretty sure that you can. All that you need to know is that the memory block you return to the application is not in use (be it by wined3d or by the underlying API / GPU) by the time the application will access it. Whether the memory is backed by a buffer object, suballocated from a larger GPU-side resource or a random chunk of system memory is an implementation detail.
NOOVERWRITE also makes perfect sense with deferred contexts. The general idea behind DISCARD and NOOVERWRITE is to gradually "fill" a larger resource over a number of draws, without requiring any synchronization between the draws. This is probably largely or entirely obvious but I'll go through the whole story to make sure we're on the same page.
The natural use case goes like this: the application wants to draw some dynamic geometry (i.e. generated at runtime, with variable primitive count). At initialization time it creates a couple large buffers (e.g. 1 MB each), one for vertices and one for indices. Then at runtime it maps both with DISCARD, fills them with whatever data is necessary for the current draw (e.g. vertex and index data for 100 triangles), keeping track of how much memory is still available in either buffer, unmaps and draws. After a while it decides that it needs to draw some more dynamic stuff. Now it does something very similar to the previous time, except if there is still some free space available it will map the buffer with NOOVERWRITE and "append" to the buffer, writing the new data after the area used by the first draw. NOOVERWRITE is a promise to d3d to not touch the data in use by any previous draw, which in practice means that the implementation can return the same memory as the previous map, rather than blocking to make sure that any pending draw using the same buffer is completed. Eventually the application will run out of space from one buffer or the other. No problem, at that point the application will map the relevant buffer with DISCARD and restart filling the buffer from the top. Here DISCARD is a different promise to d3d: the application tells d3d that it won't try to make use of the old contents of the buffer. What it implies for the implementation is that an entirely new "physical" buffer can be made available to the application, which will take the place of the previous "physical" buffer once pending draws are done with. Afterwards the application will resume mapping the buffer with NOOVERWRITE for the following dynamic draws, until the buffer fills up again and so on and so forth. Worth mentioning is that the "new" buffer that the implementation can return on a DISCARD map might in fact be an old buffer, for example coming from a buffer discarded some time earlier, currently unused and sitting idle. Or it might be a portion of a larger physical buffer that the implementation knows it's not in use.
Of course applications can and often do find ways to use those primitives more creatively, but this is the basic idea. Also, this works just as well for immediate or for deferred contexts.
For a CPU resource instead we'd copy the data into its final place in sysmem (what will eventually become its "heap_memory"). We don't quite have the infrastructure for it at the moment but that would be the goal IMO. What you do in patch 2/2 is more along the lines of this idea. The sysmem allocation mechanism works okay for the time being and could be extended to handle other types of memory, e.g. it could eventually take charge of the vulkan allocator, making it generic and usable from GL too. I don't think it has to be managed by each deferred context (i.e. it should probably be per-device), although the current scheme has the advantage of not having to care about concurrency.
In a sense there is no "current scheme", i.e. deferred contexts are tracking which memory belongs to which resource because that *has* to be done per-context, but nobody's managing custom heaps. Unless I misunderstand what's meant here?
Why? Memory is memory and is a per-device resource. All you need to do is return some available memory to the application whenever it asks for a resource map, and to do that you need to keep track of how memory is used and for how long.
So, at least in my view, I'd bring the allocator bits from patch 2/2 into patch 1/2 and use that for update_sub_resource, both the deferred and the immediate variants (the latter could be its own patch). You'd only need the DISCARD portion for update_sub_resource, so the rest can (and probably should) stay in the map() patch.
My (very rough) proposal for this would be to replace the "data" field in the wined3d_cs_update_sub_resource structure with a wined3d_bo_address, and then introduce something like the following:
static bool wined3d_device_context_get_upload_bo(struct
wined3d_device_context *context, struct wined3d_resource *resource, unsigned int sub_resource_idx, const struct wined3d_box *box, uint32_t flags, struct wined3d_bo_address *address, void **map_ptr) { ... }
This function strikes me as more than a little confusing, party because it seems to be returning two partially overlapping things, with mostly different usages...
i.e. I'd presume the idea is for wined3d_device_context_update_sub_resource() to retrieve this and then call something like wined3d_context_copy_bo_address() to copy from sysmem, which makes sense, but then what are we doing with map_ptr?
Or, if we use this in wined3d_device_context_map(), couldn't we just call wined3d_context_map_bo_address() on the returned wined3d_bo_address?
As I understand it, the idea is to return map_ptr to the application and keep track of address internally (i.e. it's the reference to the "physical resource" associated with the memory returned via map_ptr). You'll want to keep track of those struct wined3d_bo_address and in particular of when they'll become available again at some point in the future.
For deferred contexts you should then be able to do something (conceptually) very similar to patch 2/2, and in the end we may not even need map/unmap/update_sub_resource in the wined3d_device_context_ops structure.
We'd still presumably need an unmap, though, unless the idea above is to *always* allocate new (GPU or CPU) memory, which seems like a much bigger step.
I'm not entirely on top of the latest changes in wined3d and I'm not sure what we want the various _ops structs to look like eventually, but AFAIU we do want to always allocate new memory for DISCARD maps (and we basically do that already, both for Vulkan and GL, although in a somewhat roundabout fashion). As I mentioned earlier, it doesn't have to be actually new memory, if we happen to have some available memory around we can use it instead. I think the point is that maps really need to go through wined3d_device_context_ops / the CS only because, sometimes, you need new memory. Once you have a call that caters to that situation, you should be set. All the other situations can ideally be handled on the "client" side, without crossing the CS boundary. There are a few complications (what about READ maps from a STAGING resource? updates to a non-DYNAMIC buffer?) but those are the special cases, the WRITE-only DISCARD / NOOVERWRITE dynamic buffer map is the interesting use case that deserves to be streamlined. For deferred contexts, it's the only allowed case and that's no coincidence.
Unless I'm misreading things, this also seems to imply that *_resource_sub_resource_map() shouldn't exist anymore, right?
That's a separate question I think, as that's the API the upper wined3d "layers" use to access physical resources. It might need to be modified along the way to better handle these new concepts but it seems likely that we'll still need something in the same fashion