On Fri, May 28, 2021 at 7:22 AM Zebediah Figura z.figura12@gmail.com wrote:
Signed-off-by: Zebediah Figura z.figura12@gmail.com
dlls/wined3d/cs.c | 37 +++++++++++++++++++++++++++++++++---- 1 file changed, 33 insertions(+), 4 deletions(-)
diff --git a/dlls/wined3d/cs.c b/dlls/wined3d/cs.c index 50a08334dab..e6f84134795 100644 --- a/dlls/wined3d/cs.c +++ b/dlls/wined3d/cs.c @@ -490,6 +490,7 @@ struct wined3d_cs_update_sub_resource unsigned int sub_resource_idx; struct wined3d_box box; struct wined3d_sub_resource_data data;
- /* If data.data is NULL, the structure is followed by the data in memory. */
};
struct wined3d_cs_add_dirty_texture_region @@ -2597,6 +2598,7 @@ void wined3d_device_context_emit_blt_sub_resource(struct wined3d_device_context static void wined3d_cs_exec_update_sub_resource(struct wined3d_cs *cs, const void *data) { const struct wined3d_cs_update_sub_resource *op = data;
- const void *src_data = op->data.data ? op->data.data : (const BYTE *)(op + 1); struct wined3d_resource *resource = op->resource; const struct wined3d_box *box = &op->box; unsigned int width, height, depth, level;
@@ -2617,7 +2619,7 @@ static void wined3d_cs_exec_update_sub_resource(struct wined3d_cs *cs, const voi goto done; }
wined3d_buffer_upload_data(buffer, context, box, op->data.data);
}wined3d_buffer_upload_data(buffer, context, box, src_data); wined3d_buffer_invalidate_location(buffer, ~WINED3D_LOCATION_BUFFER); goto done;
@@ -2630,7 +2632,7 @@ static void wined3d_cs_exec_update_sub_resource(struct wined3d_cs *cs, const voi depth = wined3d_texture_get_level_depth(texture, level);
addr.buffer_object = 0;
- addr.addr = op->data.data;
addr.addr = src_data;
/* Only load the sub-resource for partial updates. */ if (!box->left && !box->top && !box->front
@@ -3375,8 +3377,35 @@ static void wined3d_deferred_context_update_sub_resource(struct wined3d_device_c struct wined3d_resource *resource, unsigned int sub_resource_idx, const struct wined3d_box *box, const void *data, unsigned int row_pitch, unsigned int slice_pitch) {
- FIXME("context %p, resource %p, sub_resource_idx %u, box %s, data %p, row_pitch %u, slice_pitch %u, stub!\n",
context, resource, sub_resource_idx, debug_box(box), data, row_pitch, slice_pitch);
- struct wined3d_cs_update_sub_resource *op;
- size_t data_size;
- if (resource->type == WINED3D_RTYPE_BUFFER)
- {
data_size = box->right - box->left;
- }
- else
- {
const struct wined3d_format *format = resource->format;
data_size = (box->back - box->front - 1) * slice_pitch
+ ((box->bottom - box->top - 1) / format->block_height) * row_pitch
+ ((box->right - box->left + format->block_width - 1) / format->block_width) * format->block_byte_count;
- }
- op = wined3d_device_context_require_space(context, sizeof(*op) + data_size, WINED3D_CS_QUEUE_DEFAULT);
- op->opcode = WINED3D_CS_OP_UPDATE_SUB_RESOURCE;
- op->resource = resource;
- op->sub_resource_idx = sub_resource_idx;
- op->box = *box;
- op->data.row_pitch = row_pitch;
- op->data.slice_pitch = slice_pitch;
- op->data.data = NULL;
- memcpy(op + 1, data, data_size);
- wined3d_device_context_acquire_resource(context, resource);
- wined3d_device_context_submit(context, WINED3D_CS_QUEUE_DEFAULT);
}
So this is, AFAIR, more or less what the staging patchset did. I don't think that we want to put the data in the command stream though.
Always copying the data from the user-provided source to some other storage is definitely the right choice: we don't want to block the application (more or less directly) to make sure that the data stays around until we need it. That's bad for performance for immediate contexts and just impossible for deferred, so not much to argue there. At the same time we also want to minimize the amount of copies. Ideally, for a GPU resource, here we'd upload the data right to the GPU (e.g. inside a persistently mapped buffer), ready to be used whenever it's time. For a CPU resource instead we'd copy the data into its final place in sysmem (what will eventually become its "heap_memory"). We don't quite have the infrastructure for it at the moment but that would be the goal IMO. What you do in patch 2/2 is more along the lines of this idea. The sysmem allocation mechanism works okay for the time being and could be extended to handle other types of memory, e.g. it could eventually take charge of the vulkan allocator, making it generic and usable from GL too. I don't think it has to be managed by each deferred context (i.e. it should probably be per-device), although the current scheme has the advantage of not having to care about concurrency.
So, at least in my view, I'd bring the allocator bits from patch 2/2 into patch 1/2 and use that for update_sub_resource, both the deferred and the immediate variants (the latter could be its own patch). You'd only need the DISCARD portion for update_sub_resource, so the rest can (and probably should) stay in the map() patch.