This series aims to improve GPU-side performance by avoiding VK_IMAGE_LAYOUT_GENERAL for textures that are used as render target and shader resource view. To do so, we have to transition them between VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL and VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL depending on use (and likewise for depth stencils).
It improves performance of Rocket League from 80 fps to 100 fps in a GPU limited configuration on my Radeon Polaris GPU.
This MR is marked as draft for now because I am not convinced by patch 3 yet. For actual submission I think I'll create separate MRs for patches 1-3 and 4-8.
Patch 2 introduces a validation layer error in the d3d11 tests that gets fixed in patch 3. No new test failures are introduced, although none of the existing ones are fixed either.
-- v2: wined3d: Fall back to general layout if necessary. wined3d: Avoid VK_IMAGE_LAYOUT_GENERAL. wined3d: Sync depth stencils to both early and late depth test.
From: Stefan Dösinger stefan@codeweavers.com
We don't know which test the next draw will use when we place the barrier. We may be able to figure this out, but it is not trivial.
This avoids a read-after-write hazard after the next patch. The layout transition from shader-read-only to depth stencil needs to be finished before the early depth test in a follow-up draw if this draw uses early depth. --- dlls/wined3d/resource.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/dlls/wined3d/resource.c b/dlls/wined3d/resource.c index f8fa34727bd..471d2f32f03 100644 --- a/dlls/wined3d/resource.c +++ b/dlls/wined3d/resource.c @@ -582,7 +582,7 @@ VkPipelineStageFlags vk_pipeline_stage_mask_from_bind_flags(uint32_t bind_flags) if (bind_flags & WINED3D_BIND_RENDER_TARGET) flags |= VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; if (bind_flags & WINED3D_BIND_DEPTH_STENCIL) - flags |= VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT; + flags |= VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT; if (bind_flags & WINED3D_BIND_STREAM_OUTPUT) flags |= VK_PIPELINE_STAGE_TRANSFORM_FEEDBACK_BIT_EXT;
From: Stefan Dösinger stefan@codeweavers.com
This improves GPU-side performance considerably by allowing the driver to keep lossless texture compression enabled for textures that are used as both render taget / depth stencil and shader resource.
---
I am not entirely happy with how default_image_info.imageLayout is handled. wined3d_texture_vk_get_default_image_info is called by wined3d_rendertarget_view_vk_get_image_view, although only the image view is taken, and not the entire descriptor. UAVs work because the BIND_UAV flag forces the texture to use VK_IMAGE_LAYOUT_GENERAL.
I contemplated making wined3d_texture_vk_get_default_image_info return only the view handle, and not the entire image info, but then we'd need extra storage to store the VkDescriptorImageInfo for each VkWriteDescriptorSet entry. I am not convinced this is better. One possibility is doing away with the default view entirely. --- dlls/wined3d/texture.c | 83 ++++++++++++++++++++++++++++++------------ dlls/wined3d/view.c | 5 ++- 2 files changed, 63 insertions(+), 25 deletions(-)
diff --git a/dlls/wined3d/texture.c b/dlls/wined3d/texture.c index 626b0908c68..68d6fe9c5d9 100644 --- a/dlls/wined3d/texture.c +++ b/dlls/wined3d/texture.c @@ -4906,7 +4906,10 @@ const VkDescriptorImageInfo *wined3d_texture_vk_get_default_image_info(struct wi TRACE("Created image view 0x%s.\n", wine_dbgstr_longlong(texture_vk->default_image_info.imageView));
texture_vk->default_image_info.sampler = VK_NULL_HANDLE; - texture_vk->default_image_info.imageLayout = texture_vk->layout; + if (texture_vk->layout == VK_IMAGE_LAYOUT_GENERAL) + texture_vk->default_image_info.imageLayout = VK_IMAGE_LAYOUT_GENERAL; + else + texture_vk->default_image_info.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
return &texture_vk->default_image_info; } @@ -5511,26 +5514,18 @@ BOOL wined3d_texture_vk_prepare_texture(struct wined3d_texture_vk *texture_vk, if (resource->bind_flags & WINED3D_BIND_UNORDERED_ACCESS) vk_usage |= VK_IMAGE_USAGE_STORAGE_BIT;
- texture_vk->layout = VK_IMAGE_LAYOUT_GENERAL; - if (wined3d_popcount(resource->bind_flags) == 1) + if (resource->bind_flags & WINED3D_BIND_UNORDERED_ACCESS) + texture_vk->layout = VK_IMAGE_LAYOUT_GENERAL; + else if (resource->bind_flags & WINED3D_BIND_RENDER_TARGET) + texture_vk->layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; + else if (resource->bind_flags & WINED3D_BIND_DEPTH_STENCIL) + texture_vk->layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL; + else if (resource->bind_flags & WINED3D_BIND_SHADER_RESOURCE) + texture_vk->layout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL; + else { - switch (resource->bind_flags) - { - case WINED3D_BIND_RENDER_TARGET: - texture_vk->layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; - break; - - case WINED3D_BIND_DEPTH_STENCIL: - texture_vk->layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL; - break; - - case WINED3D_BIND_SHADER_RESOURCE: - texture_vk->layout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL; - break; - - default: - break; - } + FIXME("unexpected bind flags %s, using VK_IMAGE_LAYOUT_GENERAL\n", wined3d_debug_bind_flags(resource->bind_flags)); + texture_vk->layout = VK_IMAGE_LAYOUT_GENERAL; }
if (!wined3d_context_vk_create_image(context_vk, vk_image_type, vk_usage, format_vk->vk_format, @@ -5540,6 +5535,10 @@ BOOL wined3d_texture_vk_prepare_texture(struct wined3d_texture_vk *texture_vk, return FALSE; }
+ /* We can't use a zero src access mask without synchronization2. Set the last-used bind mask to something + * non-zero to avoid this. */ + texture_vk->bind_mask = resource->bind_flags; + vk_range.aspectMask = vk_aspect_mask_from_format(&format_vk->f); vk_range.baseMipLevel = 0; vk_range.levelCount = VK_REMAINING_MIP_LEVELS; @@ -5706,15 +5705,48 @@ HRESULT wined3d_texture_vk_init(struct wined3d_texture_vk *texture_vk, struct wi flags, device, parent, parent_ops, &texture_vk[1], &wined3d_texture_vk_ops); }
+enum VkImageLayout wined3d_layout_from_bind_mask(const struct wined3d_texture_vk *texture_vk, const uint32_t bind_mask) +{ + assert(wined3d_popcount(bind_mask) == 1); + + /* We want to avoid switching between LAYOUT_GENERAL and other layouts. In Radeon GPUs (and presumably + * others), this will trigger decompressing and recompressing the texture. We also hardcode the layout + * into views when they are created. */ + if (texture_vk->layout == VK_IMAGE_LAYOUT_GENERAL) + return VK_IMAGE_LAYOUT_GENERAL; + + switch (bind_mask) + { + case WINED3D_BIND_RENDER_TARGET: + return VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; + + case WINED3D_BIND_DEPTH_STENCIL: + return VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL; + + case WINED3D_BIND_SHADER_RESOURCE: + return VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL; + + default: + ERR("Unexpected bind mask %s.\n", wined3d_debug_bind_flags(bind_mask)); + return VK_IMAGE_LAYOUT_GENERAL; + } +} + void wined3d_texture_vk_barrier(struct wined3d_texture_vk *texture_vk, struct wined3d_context_vk *context_vk, uint32_t bind_mask) { + enum VkImageLayout new_layout; uint32_t src_bind_mask = 0;
TRACE("texture_vk %p, context_vk %p, bind_mask %s.\n", texture_vk, context_vk, wined3d_debug_bind_flags(bind_mask));
- if (bind_mask & ~WINED3D_READ_ONLY_BIND_MASK) + new_layout = wined3d_layout_from_bind_mask(texture_vk, bind_mask); + + /* A layout transition is potentially a read-write operation, so even if we + * prepare the texture to e.g. read only shader resource mode, we have to wait + * for past operations to finish. */ + if (bind_mask & ~WINED3D_READ_ONLY_BIND_MASK || new_layout != texture_vk->layout) { src_bind_mask = texture_vk->bind_mask & WINED3D_READ_ONLY_BIND_MASK; if (!src_bind_mask) @@ -5732,8 +5764,9 @@ void wined3d_texture_vk_barrier(struct wined3d_texture_vk *texture_vk, { VkImageSubresourceRange vk_range;
- TRACE(" %s -> %s.\n", - wined3d_debug_bind_flags(src_bind_mask), wined3d_debug_bind_flags(bind_mask)); + TRACE(" %s(%x) -> %s(%x).\n", + wined3d_debug_bind_flags(src_bind_mask), texture_vk->layout, + wined3d_debug_bind_flags(bind_mask), new_layout);
vk_range.aspectMask = vk_aspect_mask_from_format(texture_vk->t.resource.format); vk_range.baseMipLevel = 0; @@ -5745,7 +5778,9 @@ void wined3d_texture_vk_barrier(struct wined3d_texture_vk *texture_vk, vk_pipeline_stage_mask_from_bind_flags(src_bind_mask), vk_pipeline_stage_mask_from_bind_flags(bind_mask), vk_access_mask_from_bind_flags(src_bind_mask), vk_access_mask_from_bind_flags(bind_mask), - texture_vk->layout, texture_vk->layout, texture_vk->image.vk_image, &vk_range); + texture_vk->layout, new_layout, texture_vk->image.vk_image, &vk_range); + + texture_vk->layout = new_layout; } }
diff --git a/dlls/wined3d/view.c b/dlls/wined3d/view.c index 9d62c4a5089..6bdc865b85e 100644 --- a/dlls/wined3d/view.c +++ b/dlls/wined3d/view.c @@ -1190,7 +1190,10 @@ static void wined3d_shader_resource_view_vk_cs_init(void *object)
srv_vk->view_vk.u.vk_image_info.imageView = vk_image_view; srv_vk->view_vk.u.vk_image_info.sampler = VK_NULL_HANDLE; - srv_vk->view_vk.u.vk_image_info.imageLayout = texture_vk->layout; + if (texture_vk->layout == VK_IMAGE_LAYOUT_GENERAL) + srv_vk->view_vk.u.vk_image_info.imageLayout = VK_IMAGE_LAYOUT_GENERAL; + else + srv_vk->view_vk.u.vk_image_info.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL; }
HRESULT wined3d_shader_resource_view_vk_init(struct wined3d_shader_resource_view_vk *view_vk,
From: Stefan Dösinger stefan@codeweavers.com
If a texture is used as shader resource and DS/RT at the same time, including cases where a different subresource is bound to each.
In theory this could be handled better, but would require per-subresource layout tracking.
---
It might make sense to merge this into the previous patch. Without it, we get a validation error in test_sample_attached_rtv. I feel it is easier to review them separately and we have plenty of pre-existing test and validation errors with the vulkan backend. --- dlls/wined3d/context_vk.c | 22 +++++++++++++++++++++- dlls/wined3d/texture.c | 29 +++++++++++++++++++++++++++++ dlls/wined3d/view.c | 8 +++++++- dlls/wined3d/wined3d_private.h | 6 +++++- 4 files changed, 62 insertions(+), 3 deletions(-)
diff --git a/dlls/wined3d/context_vk.c b/dlls/wined3d/context_vk.c index 1acb8a8d201..10713985e7e 100644 --- a/dlls/wined3d/context_vk.c +++ b/dlls/wined3d/context_vk.c @@ -2599,6 +2599,14 @@ static bool wined3d_context_vk_begin_render_pass(struct wined3d_context_vk *cont continue;
rtv_vk = wined3d_rendertarget_view_vk(view); + + if (rtv_vk->v.resource->bind_count) + { + struct wined3d_texture_vk *texture_vk; + texture_vk = wined3d_texture_vk(wined3d_texture_from_resource(rtv_vk->v.resource)); + wined3d_texture_vk_make_generic(texture_vk, context_vk); + } + vk_views[attachment_count] = wined3d_rendertarget_view_vk_get_image_view(rtv_vk, context_vk); wined3d_rendertarget_view_vk_barrier(rtv_vk, context_vk, WINED3D_BIND_RENDER_TARGET); wined3d_context_vk_reference_rendertarget_view(context_vk, rtv_vk); @@ -2634,6 +2642,14 @@ static bool wined3d_context_vk_begin_render_pass(struct wined3d_context_vk *cont if ((view = state->fb.depth_stencil)) { rtv_vk = wined3d_rendertarget_view_vk(view); + + if (rtv_vk->v.resource->bind_count) + { + struct wined3d_texture_vk *texture_vk; + texture_vk = wined3d_texture_vk(wined3d_texture_from_resource(rtv_vk->v.resource)); + wined3d_texture_vk_make_generic(texture_vk, context_vk); + } + vk_views[attachment_count] = wined3d_rendertarget_view_vk_get_image_view(rtv_vk, context_vk); wined3d_rendertarget_view_vk_barrier(rtv_vk, context_vk, WINED3D_BIND_DEPTH_STENCIL); wined3d_context_vk_reference_rendertarget_view(context_vk, rtv_vk); @@ -3023,7 +3039,11 @@ static bool wined3d_shader_descriptor_writes_vk_add_srv_write(struct wined3d_sha struct wined3d_texture_vk *texture_vk = wined3d_texture_vk(texture_from_resource(resource));
if (view_vk->u.vk_image_info.imageView) + { image_info = &view_vk->u.vk_image_info; + if (image_info->imageLayout != texture_vk->layout) + wined3d_shader_resource_view_vk_update_layout(srv_vk, texture_vk->layout); + } else image_info = wined3d_texture_vk_get_default_image_info(texture_vk, context_vk); buffer_view = NULL; @@ -3389,7 +3409,7 @@ static void wined3d_context_vk_load_shader_resources(struct wined3d_context_vk * { if (!srv_vk->view_vk.bo_user.valid) { - wined3d_shader_resource_view_vk_update(srv_vk, context_vk); + wined3d_shader_resource_view_vk_update_buffer(srv_vk, context_vk); if (pipeline == WINED3D_PIPELINE_GRAPHICS) context_invalidate_state(&context_vk->c, STATE_GRAPHICS_SHADER_RESOURCE_BINDING); else diff --git a/dlls/wined3d/texture.c b/dlls/wined3d/texture.c index 68d6fe9c5d9..e6618825380 100644 --- a/dlls/wined3d/texture.c +++ b/dlls/wined3d/texture.c @@ -5784,6 +5784,35 @@ void wined3d_texture_vk_barrier(struct wined3d_texture_vk *texture_vk, } }
+/* This is called when a texture is used as render target and shader resource + * or depth stencil and shader resource at the same time. This can either be + * read-only simultaneos use as depth stencil, but also for rendering to one + * subresource while reading from another. Without tracking of barriers and + * layouts per subresource VK_IMAGE_LAYOUT_GENERAL is the only thing we can do. */ +void wined3d_texture_vk_make_generic(struct wined3d_texture_vk *texture_vk, + struct wined3d_context_vk *context_vk) +{ + VkImageSubresourceRange vk_range; + + if (texture_vk->layout == VK_IMAGE_LAYOUT_GENERAL) + return; + + vk_range.aspectMask = vk_aspect_mask_from_format(texture_vk->t.resource.format); + vk_range.baseMipLevel = 0; + vk_range.levelCount = VK_REMAINING_MIP_LEVELS; + vk_range.baseArrayLayer = 0; + vk_range.layerCount = VK_REMAINING_ARRAY_LAYERS; + + wined3d_context_vk_image_barrier(context_vk, wined3d_context_vk_get_command_buffer(context_vk), + VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, + VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, + 0, 0, + texture_vk->layout, VK_IMAGE_LAYOUT_GENERAL, texture_vk->image.vk_image, &vk_range); + + texture_vk->layout = VK_IMAGE_LAYOUT_GENERAL; + texture_vk->default_image_info.imageLayout = VK_IMAGE_LAYOUT_GENERAL; +} + static void ffp_blitter_destroy(struct wined3d_blitter *blitter, struct wined3d_context *context) { struct wined3d_blitter *next; diff --git a/dlls/wined3d/view.c b/dlls/wined3d/view.c index 6bdc865b85e..a95ff13410f 100644 --- a/dlls/wined3d/view.c +++ b/dlls/wined3d/view.c @@ -1101,7 +1101,7 @@ HRESULT wined3d_shader_resource_view_gl_init(struct wined3d_shader_resource_view return hr; }
-void wined3d_shader_resource_view_vk_update(struct wined3d_shader_resource_view_vk *srv_vk, +void wined3d_shader_resource_view_vk_update_buffer(struct wined3d_shader_resource_view_vk *srv_vk, struct wined3d_context_vk *context_vk) { const struct wined3d_format_vk *view_format_vk = wined3d_format_vk(srv_vk->v.format); @@ -1120,6 +1120,12 @@ void wined3d_shader_resource_view_vk_update(struct wined3d_shader_resource_view_ } }
+void wined3d_shader_resource_view_vk_update_layout(struct wined3d_shader_resource_view_vk *srv_vk, + VkImageLayout layout) +{ + srv_vk->view_vk.u.vk_image_info.imageLayout = layout; +} + static void wined3d_shader_resource_view_vk_cs_init(void *object) { struct wined3d_shader_resource_view_vk *srv_vk = object; diff --git a/dlls/wined3d/wined3d_private.h b/dlls/wined3d/wined3d_private.h index f808852ffef..03c4f2ba404 100644 --- a/dlls/wined3d/wined3d_private.h +++ b/dlls/wined3d/wined3d_private.h @@ -4877,6 +4877,8 @@ const VkDescriptorImageInfo *wined3d_texture_vk_get_default_image_info(struct wi HRESULT wined3d_texture_vk_init(struct wined3d_texture_vk *texture_vk, struct wined3d_device *device, const struct wined3d_resource_desc *desc, unsigned int layer_count, unsigned int level_count, uint32_t flags, void *parent, const struct wined3d_parent_ops *parent_ops) DECLSPEC_HIDDEN; +void wined3d_texture_vk_make_generic(struct wined3d_texture_vk *texture_vk, + struct wined3d_context_vk *context_vk) DECLSPEC_HIDDEN; BOOL wined3d_texture_vk_prepare_texture(struct wined3d_texture_vk *texture_vk, struct wined3d_context_vk *context_vk) DECLSPEC_HIDDEN;
@@ -5594,8 +5596,10 @@ void wined3d_shader_resource_view_vk_generate_mipmap(struct wined3d_shader_resou HRESULT wined3d_shader_resource_view_vk_init(struct wined3d_shader_resource_view_vk *view_vk, const struct wined3d_view_desc *desc, struct wined3d_resource *resource, void *parent, const struct wined3d_parent_ops *parent_ops) DECLSPEC_HIDDEN; -void wined3d_shader_resource_view_vk_update(struct wined3d_shader_resource_view_vk *view_vk, +void wined3d_shader_resource_view_vk_update_buffer(struct wined3d_shader_resource_view_vk *view_vk, struct wined3d_context_vk *context_vk) DECLSPEC_HIDDEN; +void wined3d_shader_resource_view_vk_update_layout(struct wined3d_shader_resource_view_vk *srv_vk, + VkImageLayout layout) DECLSPEC_HIDDEN;
struct wined3d_unordered_access_view {
I updated the patches. The rotate bind flags one is already upstream. The patches optimizing blits are removed because I cannot detect any performance advantage with them with the write-after-write hazard fixed.
I'm also removing the draft marker from this merge request.
Jan Sikorski (@jsikorski) commented about dlls/wined3d/texture.c:
TRACE("Created image view 0x%s.\n", wine_dbgstr_longlong(texture_vk->default_image_info.imageView)); texture_vk->default_image_info.sampler = VK_NULL_HANDLE;
- texture_vk->default_image_info.imageLayout = texture_vk->layout;
- if (texture_vk->layout == VK_IMAGE_LAYOUT_GENERAL)
texture_vk->default_image_info.imageLayout = VK_IMAGE_LAYOUT_GENERAL;
- else
texture_vk->default_image_info.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
It looks correct, but fairly brittle in that it depends on how we're setting things up elsewhere, as you mentioned below the commit message; I think it deserves a short explanation inline with the code.