[PATCH v2 0/3] MR2455: wined3d: Avoid VK_IMAGE_LAYOUT_GENERAL.
This series aims to improve GPU-side performance by avoiding VK_IMAGE_LAYOUT_GENERAL for textures that are used as render target and shader resource view. To do so, we have to transition them between VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL and VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL depending on use (and likewise for depth stencils). It improves performance of Rocket League from 80 fps to 100 fps in a GPU limited configuration on my Radeon Polaris GPU. This MR is marked as draft for now because I am not convinced by patch 3 yet. For actual submission I think I'll create separate MRs for patches 1-3 and 4-8. Patch 2 introduces a validation layer error in the d3d11 tests that gets fixed in patch 3. No new test failures are introduced, although none of the existing ones are fixed either. -- v2: wined3d: Fall back to general layout if necessary. wined3d: Avoid VK_IMAGE_LAYOUT_GENERAL. wined3d: Sync depth stencils to both early and late depth test. https://gitlab.winehq.org/wine/wine/-/merge_requests/2455
From: Stefan Dösinger <stefan(a)codeweavers.com> We don't know which test the next draw will use when we place the barrier. We may be able to figure this out, but it is not trivial. This avoids a read-after-write hazard after the next patch. The layout transition from shader-read-only to depth stencil needs to be finished before the early depth test in a follow-up draw if this draw uses early depth. --- dlls/wined3d/resource.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dlls/wined3d/resource.c b/dlls/wined3d/resource.c index f8fa34727bd..471d2f32f03 100644 --- a/dlls/wined3d/resource.c +++ b/dlls/wined3d/resource.c @@ -582,7 +582,7 @@ VkPipelineStageFlags vk_pipeline_stage_mask_from_bind_flags(uint32_t bind_flags) if (bind_flags & WINED3D_BIND_RENDER_TARGET) flags |= VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; if (bind_flags & WINED3D_BIND_DEPTH_STENCIL) - flags |= VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT; + flags |= VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT; if (bind_flags & WINED3D_BIND_STREAM_OUTPUT) flags |= VK_PIPELINE_STAGE_TRANSFORM_FEEDBACK_BIT_EXT; -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/2455
From: Stefan Dösinger <stefan(a)codeweavers.com> This improves GPU-side performance considerably by allowing the driver to keep lossless texture compression enabled for textures that are used as both render taget / depth stencil and shader resource. --- I am not entirely happy with how default_image_info.imageLayout is handled. wined3d_texture_vk_get_default_image_info is called by wined3d_rendertarget_view_vk_get_image_view, although only the image view is taken, and not the entire descriptor. UAVs work because the BIND_UAV flag forces the texture to use VK_IMAGE_LAYOUT_GENERAL. I contemplated making wined3d_texture_vk_get_default_image_info return only the view handle, and not the entire image info, but then we'd need extra storage to store the VkDescriptorImageInfo for each VkWriteDescriptorSet entry. I am not convinced this is better. One possibility is doing away with the default view entirely. --- dlls/wined3d/texture.c | 83 ++++++++++++++++++++++++++++++------------ dlls/wined3d/view.c | 5 ++- 2 files changed, 63 insertions(+), 25 deletions(-) diff --git a/dlls/wined3d/texture.c b/dlls/wined3d/texture.c index 626b0908c68..68d6fe9c5d9 100644 --- a/dlls/wined3d/texture.c +++ b/dlls/wined3d/texture.c @@ -4906,7 +4906,10 @@ const VkDescriptorImageInfo *wined3d_texture_vk_get_default_image_info(struct wi TRACE("Created image view 0x%s.\n", wine_dbgstr_longlong(texture_vk->default_image_info.imageView)); texture_vk->default_image_info.sampler = VK_NULL_HANDLE; - texture_vk->default_image_info.imageLayout = texture_vk->layout; + if (texture_vk->layout == VK_IMAGE_LAYOUT_GENERAL) + texture_vk->default_image_info.imageLayout = VK_IMAGE_LAYOUT_GENERAL; + else + texture_vk->default_image_info.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL; return &texture_vk->default_image_info; } @@ -5511,26 +5514,18 @@ BOOL wined3d_texture_vk_prepare_texture(struct wined3d_texture_vk *texture_vk, if (resource->bind_flags & WINED3D_BIND_UNORDERED_ACCESS) vk_usage |= VK_IMAGE_USAGE_STORAGE_BIT; - texture_vk->layout = VK_IMAGE_LAYOUT_GENERAL; - if (wined3d_popcount(resource->bind_flags) == 1) + if (resource->bind_flags & WINED3D_BIND_UNORDERED_ACCESS) + texture_vk->layout = VK_IMAGE_LAYOUT_GENERAL; + else if (resource->bind_flags & WINED3D_BIND_RENDER_TARGET) + texture_vk->layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; + else if (resource->bind_flags & WINED3D_BIND_DEPTH_STENCIL) + texture_vk->layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL; + else if (resource->bind_flags & WINED3D_BIND_SHADER_RESOURCE) + texture_vk->layout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL; + else { - switch (resource->bind_flags) - { - case WINED3D_BIND_RENDER_TARGET: - texture_vk->layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; - break; - - case WINED3D_BIND_DEPTH_STENCIL: - texture_vk->layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL; - break; - - case WINED3D_BIND_SHADER_RESOURCE: - texture_vk->layout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL; - break; - - default: - break; - } + FIXME("unexpected bind flags %s, using VK_IMAGE_LAYOUT_GENERAL\n", wined3d_debug_bind_flags(resource->bind_flags)); + texture_vk->layout = VK_IMAGE_LAYOUT_GENERAL; } if (!wined3d_context_vk_create_image(context_vk, vk_image_type, vk_usage, format_vk->vk_format, @@ -5540,6 +5535,10 @@ BOOL wined3d_texture_vk_prepare_texture(struct wined3d_texture_vk *texture_vk, return FALSE; } + /* We can't use a zero src access mask without synchronization2. Set the last-used bind mask to something + * non-zero to avoid this. */ + texture_vk->bind_mask = resource->bind_flags; + vk_range.aspectMask = vk_aspect_mask_from_format(&format_vk->f); vk_range.baseMipLevel = 0; vk_range.levelCount = VK_REMAINING_MIP_LEVELS; @@ -5706,15 +5705,48 @@ HRESULT wined3d_texture_vk_init(struct wined3d_texture_vk *texture_vk, struct wi flags, device, parent, parent_ops, &texture_vk[1], &wined3d_texture_vk_ops); } +enum VkImageLayout wined3d_layout_from_bind_mask(const struct wined3d_texture_vk *texture_vk, const uint32_t bind_mask) +{ + assert(wined3d_popcount(bind_mask) == 1); + + /* We want to avoid switching between LAYOUT_GENERAL and other layouts. In Radeon GPUs (and presumably + * others), this will trigger decompressing and recompressing the texture. We also hardcode the layout + * into views when they are created. */ + if (texture_vk->layout == VK_IMAGE_LAYOUT_GENERAL) + return VK_IMAGE_LAYOUT_GENERAL; + + switch (bind_mask) + { + case WINED3D_BIND_RENDER_TARGET: + return VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; + + case WINED3D_BIND_DEPTH_STENCIL: + return VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL; + + case WINED3D_BIND_SHADER_RESOURCE: + return VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL; + + default: + ERR("Unexpected bind mask %s.\n", wined3d_debug_bind_flags(bind_mask)); + return VK_IMAGE_LAYOUT_GENERAL; + } +} + void wined3d_texture_vk_barrier(struct wined3d_texture_vk *texture_vk, struct wined3d_context_vk *context_vk, uint32_t bind_mask) { + enum VkImageLayout new_layout; uint32_t src_bind_mask = 0; TRACE("texture_vk %p, context_vk %p, bind_mask %s.\n", texture_vk, context_vk, wined3d_debug_bind_flags(bind_mask)); - if (bind_mask & ~WINED3D_READ_ONLY_BIND_MASK) + new_layout = wined3d_layout_from_bind_mask(texture_vk, bind_mask); + + /* A layout transition is potentially a read-write operation, so even if we + * prepare the texture to e.g. read only shader resource mode, we have to wait + * for past operations to finish. */ + if (bind_mask & ~WINED3D_READ_ONLY_BIND_MASK || new_layout != texture_vk->layout) { src_bind_mask = texture_vk->bind_mask & WINED3D_READ_ONLY_BIND_MASK; if (!src_bind_mask) @@ -5732,8 +5764,9 @@ void wined3d_texture_vk_barrier(struct wined3d_texture_vk *texture_vk, { VkImageSubresourceRange vk_range; - TRACE(" %s -> %s.\n", - wined3d_debug_bind_flags(src_bind_mask), wined3d_debug_bind_flags(bind_mask)); + TRACE(" %s(%x) -> %s(%x).\n", + wined3d_debug_bind_flags(src_bind_mask), texture_vk->layout, + wined3d_debug_bind_flags(bind_mask), new_layout); vk_range.aspectMask = vk_aspect_mask_from_format(texture_vk->t.resource.format); vk_range.baseMipLevel = 0; @@ -5745,7 +5778,9 @@ void wined3d_texture_vk_barrier(struct wined3d_texture_vk *texture_vk, vk_pipeline_stage_mask_from_bind_flags(src_bind_mask), vk_pipeline_stage_mask_from_bind_flags(bind_mask), vk_access_mask_from_bind_flags(src_bind_mask), vk_access_mask_from_bind_flags(bind_mask), - texture_vk->layout, texture_vk->layout, texture_vk->image.vk_image, &vk_range); + texture_vk->layout, new_layout, texture_vk->image.vk_image, &vk_range); + + texture_vk->layout = new_layout; } } diff --git a/dlls/wined3d/view.c b/dlls/wined3d/view.c index 9d62c4a5089..6bdc865b85e 100644 --- a/dlls/wined3d/view.c +++ b/dlls/wined3d/view.c @@ -1190,7 +1190,10 @@ static void wined3d_shader_resource_view_vk_cs_init(void *object) srv_vk->view_vk.u.vk_image_info.imageView = vk_image_view; srv_vk->view_vk.u.vk_image_info.sampler = VK_NULL_HANDLE; - srv_vk->view_vk.u.vk_image_info.imageLayout = texture_vk->layout; + if (texture_vk->layout == VK_IMAGE_LAYOUT_GENERAL) + srv_vk->view_vk.u.vk_image_info.imageLayout = VK_IMAGE_LAYOUT_GENERAL; + else + srv_vk->view_vk.u.vk_image_info.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL; } HRESULT wined3d_shader_resource_view_vk_init(struct wined3d_shader_resource_view_vk *view_vk, -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/2455
From: Stefan Dösinger <stefan(a)codeweavers.com> If a texture is used as shader resource and DS/RT at the same time, including cases where a different subresource is bound to each. In theory this could be handled better, but would require per-subresource layout tracking. --- It might make sense to merge this into the previous patch. Without it, we get a validation error in test_sample_attached_rtv. I feel it is easier to review them separately and we have plenty of pre-existing test and validation errors with the vulkan backend. --- dlls/wined3d/context_vk.c | 22 +++++++++++++++++++++- dlls/wined3d/texture.c | 29 +++++++++++++++++++++++++++++ dlls/wined3d/view.c | 8 +++++++- dlls/wined3d/wined3d_private.h | 6 +++++- 4 files changed, 62 insertions(+), 3 deletions(-) diff --git a/dlls/wined3d/context_vk.c b/dlls/wined3d/context_vk.c index 1acb8a8d201..10713985e7e 100644 --- a/dlls/wined3d/context_vk.c +++ b/dlls/wined3d/context_vk.c @@ -2599,6 +2599,14 @@ static bool wined3d_context_vk_begin_render_pass(struct wined3d_context_vk *cont continue; rtv_vk = wined3d_rendertarget_view_vk(view); + + if (rtv_vk->v.resource->bind_count) + { + struct wined3d_texture_vk *texture_vk; + texture_vk = wined3d_texture_vk(wined3d_texture_from_resource(rtv_vk->v.resource)); + wined3d_texture_vk_make_generic(texture_vk, context_vk); + } + vk_views[attachment_count] = wined3d_rendertarget_view_vk_get_image_view(rtv_vk, context_vk); wined3d_rendertarget_view_vk_barrier(rtv_vk, context_vk, WINED3D_BIND_RENDER_TARGET); wined3d_context_vk_reference_rendertarget_view(context_vk, rtv_vk); @@ -2634,6 +2642,14 @@ static bool wined3d_context_vk_begin_render_pass(struct wined3d_context_vk *cont if ((view = state->fb.depth_stencil)) { rtv_vk = wined3d_rendertarget_view_vk(view); + + if (rtv_vk->v.resource->bind_count) + { + struct wined3d_texture_vk *texture_vk; + texture_vk = wined3d_texture_vk(wined3d_texture_from_resource(rtv_vk->v.resource)); + wined3d_texture_vk_make_generic(texture_vk, context_vk); + } + vk_views[attachment_count] = wined3d_rendertarget_view_vk_get_image_view(rtv_vk, context_vk); wined3d_rendertarget_view_vk_barrier(rtv_vk, context_vk, WINED3D_BIND_DEPTH_STENCIL); wined3d_context_vk_reference_rendertarget_view(context_vk, rtv_vk); @@ -3023,7 +3039,11 @@ static bool wined3d_shader_descriptor_writes_vk_add_srv_write(struct wined3d_sha struct wined3d_texture_vk *texture_vk = wined3d_texture_vk(texture_from_resource(resource)); if (view_vk->u.vk_image_info.imageView) + { image_info = &view_vk->u.vk_image_info; + if (image_info->imageLayout != texture_vk->layout) + wined3d_shader_resource_view_vk_update_layout(srv_vk, texture_vk->layout); + } else image_info = wined3d_texture_vk_get_default_image_info(texture_vk, context_vk); buffer_view = NULL; @@ -3389,7 +3409,7 @@ static void wined3d_context_vk_load_shader_resources(struct wined3d_context_vk * { if (!srv_vk->view_vk.bo_user.valid) { - wined3d_shader_resource_view_vk_update(srv_vk, context_vk); + wined3d_shader_resource_view_vk_update_buffer(srv_vk, context_vk); if (pipeline == WINED3D_PIPELINE_GRAPHICS) context_invalidate_state(&context_vk->c, STATE_GRAPHICS_SHADER_RESOURCE_BINDING); else diff --git a/dlls/wined3d/texture.c b/dlls/wined3d/texture.c index 68d6fe9c5d9..e6618825380 100644 --- a/dlls/wined3d/texture.c +++ b/dlls/wined3d/texture.c @@ -5784,6 +5784,35 @@ void wined3d_texture_vk_barrier(struct wined3d_texture_vk *texture_vk, } } +/* This is called when a texture is used as render target and shader resource + * or depth stencil and shader resource at the same time. This can either be + * read-only simultaneos use as depth stencil, but also for rendering to one + * subresource while reading from another. Without tracking of barriers and + * layouts per subresource VK_IMAGE_LAYOUT_GENERAL is the only thing we can do. */ +void wined3d_texture_vk_make_generic(struct wined3d_texture_vk *texture_vk, + struct wined3d_context_vk *context_vk) +{ + VkImageSubresourceRange vk_range; + + if (texture_vk->layout == VK_IMAGE_LAYOUT_GENERAL) + return; + + vk_range.aspectMask = vk_aspect_mask_from_format(texture_vk->t.resource.format); + vk_range.baseMipLevel = 0; + vk_range.levelCount = VK_REMAINING_MIP_LEVELS; + vk_range.baseArrayLayer = 0; + vk_range.layerCount = VK_REMAINING_ARRAY_LAYERS; + + wined3d_context_vk_image_barrier(context_vk, wined3d_context_vk_get_command_buffer(context_vk), + VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, + VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, + 0, 0, + texture_vk->layout, VK_IMAGE_LAYOUT_GENERAL, texture_vk->image.vk_image, &vk_range); + + texture_vk->layout = VK_IMAGE_LAYOUT_GENERAL; + texture_vk->default_image_info.imageLayout = VK_IMAGE_LAYOUT_GENERAL; +} + static void ffp_blitter_destroy(struct wined3d_blitter *blitter, struct wined3d_context *context) { struct wined3d_blitter *next; diff --git a/dlls/wined3d/view.c b/dlls/wined3d/view.c index 6bdc865b85e..a95ff13410f 100644 --- a/dlls/wined3d/view.c +++ b/dlls/wined3d/view.c @@ -1101,7 +1101,7 @@ HRESULT wined3d_shader_resource_view_gl_init(struct wined3d_shader_resource_view return hr; } -void wined3d_shader_resource_view_vk_update(struct wined3d_shader_resource_view_vk *srv_vk, +void wined3d_shader_resource_view_vk_update_buffer(struct wined3d_shader_resource_view_vk *srv_vk, struct wined3d_context_vk *context_vk) { const struct wined3d_format_vk *view_format_vk = wined3d_format_vk(srv_vk->v.format); @@ -1120,6 +1120,12 @@ void wined3d_shader_resource_view_vk_update(struct wined3d_shader_resource_view_ } } +void wined3d_shader_resource_view_vk_update_layout(struct wined3d_shader_resource_view_vk *srv_vk, + VkImageLayout layout) +{ + srv_vk->view_vk.u.vk_image_info.imageLayout = layout; +} + static void wined3d_shader_resource_view_vk_cs_init(void *object) { struct wined3d_shader_resource_view_vk *srv_vk = object; diff --git a/dlls/wined3d/wined3d_private.h b/dlls/wined3d/wined3d_private.h index f808852ffef..03c4f2ba404 100644 --- a/dlls/wined3d/wined3d_private.h +++ b/dlls/wined3d/wined3d_private.h @@ -4877,6 +4877,8 @@ const VkDescriptorImageInfo *wined3d_texture_vk_get_default_image_info(struct wi HRESULT wined3d_texture_vk_init(struct wined3d_texture_vk *texture_vk, struct wined3d_device *device, const struct wined3d_resource_desc *desc, unsigned int layer_count, unsigned int level_count, uint32_t flags, void *parent, const struct wined3d_parent_ops *parent_ops) DECLSPEC_HIDDEN; +void wined3d_texture_vk_make_generic(struct wined3d_texture_vk *texture_vk, + struct wined3d_context_vk *context_vk) DECLSPEC_HIDDEN; BOOL wined3d_texture_vk_prepare_texture(struct wined3d_texture_vk *texture_vk, struct wined3d_context_vk *context_vk) DECLSPEC_HIDDEN; @@ -5594,8 +5596,10 @@ void wined3d_shader_resource_view_vk_generate_mipmap(struct wined3d_shader_resou HRESULT wined3d_shader_resource_view_vk_init(struct wined3d_shader_resource_view_vk *view_vk, const struct wined3d_view_desc *desc, struct wined3d_resource *resource, void *parent, const struct wined3d_parent_ops *parent_ops) DECLSPEC_HIDDEN; -void wined3d_shader_resource_view_vk_update(struct wined3d_shader_resource_view_vk *view_vk, +void wined3d_shader_resource_view_vk_update_buffer(struct wined3d_shader_resource_view_vk *view_vk, struct wined3d_context_vk *context_vk) DECLSPEC_HIDDEN; +void wined3d_shader_resource_view_vk_update_layout(struct wined3d_shader_resource_view_vk *srv_vk, + VkImageLayout layout) DECLSPEC_HIDDEN; struct wined3d_unordered_access_view { -- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/2455
I updated the patches. The rotate bind flags one is already upstream. The patches optimizing blits are removed because I cannot detect any performance advantage with them with the write-after-write hazard fixed. I'm also removing the draft marker from this merge request. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/2455#note_29010
Jan Sikorski (@jsikorski) commented about dlls/wined3d/texture.c:
TRACE("Created image view 0x%s.\n", wine_dbgstr_longlong(texture_vk->default_image_info.imageView));
texture_vk->default_image_info.sampler = VK_NULL_HANDLE; - texture_vk->default_image_info.imageLayout = texture_vk->layout; + if (texture_vk->layout == VK_IMAGE_LAYOUT_GENERAL) + texture_vk->default_image_info.imageLayout = VK_IMAGE_LAYOUT_GENERAL; + else + texture_vk->default_image_info.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
It looks correct, but fairly brittle in that it depends on how we're setting things up elsewhere, as you mentioned below the commit message; I think it deserves a short explanation inline with the code. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/2455#note_29301
participants (2)
-
Jan Sikorski (@jsikorski) -
Stefan Dösinger