Re: [PATCH 1/2] wined3d: Implement wined3d_deferred_context_update_sub_resource().

11 Jun 2021


      On 6/11/21 2:44 AM, Matteo Bruni wrote:
...
On Fri, Jun 11, 2021 at 5:40 AM Zebediah Figura (she/her)
zfigura@codeweavers.com wrote:
...
On 5/31/21 6:43 PM, Henri Verbeet wrote:
...
On Mon, 31 May 2021 at 23:45, Zebediah Figura (she/her)
zfigura@codeweavers.com wrote:
...
On 5/31/21 4:53 AM, Henri Verbeet wrote:
...
My (very rough) proposal for this would be to replace the "data" field
in the wined3d_cs_update_sub_resource structure with a
wined3d_bo_address, and then introduce something like the following:
   static bool wined3d_device_context_get_upload_bo(struct

wined3d_device_context *context,
               struct wined3d_resource *resource, unsigned int
sub_resource_idx, const struct wined3d_box *box,
               uint32_t flags, struct wined3d_bo_address *address, void **map_ptr)
       {
           ...
       }
This function strikes me as more than a little confusing, party because
it seems to be returning two partially overlapping things, with mostly
different usages...
i.e. I'd presume the idea is for
wined3d_device_context_update_sub_resource() to retrieve this and then
call something like wined3d_context_copy_bo_address() to copy from
sysmem, which makes sense, but then what are we doing with map_ptr?
Well, wined3d_context_copy_bo_address() needs a wined3d_context, which
you typically don't have on application threads.
The basic idea is that you'd pass "address" to
WINED3D_CS_OP_UPDATE_SUB_RESOURCE, and "map_ptr " to the application.
(In case of map/unmap; for update_sub_resource() you'd just copy the
data into "map_ptr".)
...
Or, if we use this in wined3d_device_context_map(), couldn't we just
call wined3d_context_map_bo_address() on the returned wined3d_bo_address?
Conceptually, sure, you could split this into allocate/map/unmap. In
practice though, you'll always want the upload buffer to be mapped, in
order to copy data into it from system memory. And in fact, if we're
going to return a GPU buffer here, it should generally already be
mapped. (For various reasons; for 64-bit applications, we use
persistent maps if we can; for OpenGL, we want to avoid making GL
calls from application threads; if we have a dedicated upload buffer
belonging to the device context, it would have been mapped during
creation.) When needed, unmap can happen when the upload buffer is
consumed.
Thanks for explaining the rationale there, that does help a lot.
It still feels awkward to me to pass the same information maybe twice,
but I guess the wined3d_bo_address part should basically be considered
opaque above the CS thread.
Yes, as I understand it that's to be used by wined3d to keep track of
the lifetime of the memory block.
...
...
...
...
For deferred contexts you should then be able to do something
(conceptually) very similar to patch 2/2, and in the end we may not
even need map/unmap/update_sub_resource in the
wined3d_device_context_ops structure.
We'd still presumably need an unmap, though, unless the idea above is to
*always* allocate new (GPU or CPU) memory, which seems like a much
bigger step.
Not necessarily, but wined3d_cs_exec_update_sub_resource() can unmap if needed.
...
Unless I'm misreading things, this also seems to imply that
*_resource_sub_resource_map() shouldn't exist anymore, right?
Non-discard maps still exist, and
wined3d_device_context_get_upload_bo() wouldn't be used for those.
So the idea goes like this:
(1) Pass a whole wined3d_bo_address as part of struct
wined3d_cs_update_sub_resource.
(2) Introduce wined3d_device_context_get_upload_bo() to allocate memory.
At first, only use it for deferred contexts.
(3) Use wined3d_device_context_get_upload_bo() for immediate contexts as
well, for both DISCARD and NOOVERWRITE maps and for update_sub_resource.
This implies always allocating new CPU/GPU memory, going through the
DEFAULT queue, and not waiting for the upload to complete.
wined3d_device_context_get_upload_bo() should probably avoid the
queues entirely, at least in the normal case. More below.
Right, sorry, I don't mean that wined3d_device_context_get_upload_bo() 
goes through the queues, but that the WINED3D_CS_UPDATE_SUB_RESOURCE 
goes through the DEFAULT queue instead of the MAP queue once it's submitted.
...
...
The idea that map/unmap/update_sub_resource can go away seems to imply
(3) to me, but I'm more than a little curious as to whether it's
actually desirable in all cases. Even with persistent mapping you still
need to allocate new memory, which isn't exactly cheap, especially with
large resources.
You don't need to actually allocate new memory every time, just return
memory that you know is available at the moment. The specific
mechanism used by wined3d_device_get_upload_bo() isn't important to
the callers but yes, the API we want would in fact return a new,
independent memory area for each DISCARD map. It's the only way to
make things fast with constant buffer updates, to name the most
critical example. For some of those you'll easily have thousands of
DISCARD maps per frame and if you don't want to block on buffer maps
and e.g. you have one frame latency to play with, that means having
thousands of "instances" of each buffer moving around at any given
time.
WRT the mechanism, it might be allocating a large chunk of system
memory and suballocating from it, keeping track of the lifetime of
each chunk as it is returned to the application and then back to
wined3d. Or suballocating from GPU memory. It could even be creating a
ton of GL buffers and flipping them around as needed, it doesn't
matter; the idea is that each memory chunk can be reused, once we know
the memory is available again because it's not used by wined3d (e.g.
if it's a block of system memory that's been uploaded to the GPU) or
the GPU (e.g. if we directly returned the memory where the GPU
resource is stored) anymore. Ideally, when the application is in a
steady state, there should be next to no actual memory allocation or
GL / Vulkan resource creation. But, in principle, everything would
work just as well (if not as fast) if we allocated system memory every
time wined3d_device_context_get_upload_bo() is called and freed it
once wined3d is done with uploading the data that was stored into it
by the application.
FWIW, the DYNAMIC d3d usage is supposed to flag the resources that are
going to be updated frequently and are performance sensitive. In
principle we could reserve wined3d_device_context_get_upload_bo() to
DYNAMIC resources only.
...
And then there's the aforementioned non-discard maps—don't we still need
an explicit WINED3D_CS_OP_UNMAP for those? Or any case (are there any?)
where we don't want to persistently map a resource?
We might want to keep the existing CS operations for these cases,
sure. I suspect it will become clearer further down the road.
Okay, many thanks to you and Henri for those answers; I think I have a 
clear enough picture now.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [PATCH 1/2] wined3d: Implement wined3d_deferred_context_update_sub_resource().