https://bugs.winehq.org/show_bug.cgi?id=45901
Bug ID: 45901 Summary: Avoid GPU synchronization due to GPU-CPU transfer (Overwatch) Product: Wine Version: unspecified Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: enhancement Priority: P2 Component: directx-d3d Assignee: wine-bugs@winehq.org Reporter: awesie@gmail.com Distribution: ---
Created attachment 62422 --> https://bugs.winehq.org/attachment.cgi?id=62422 Patch to avoid GPU synchronization
One performance gap with Overwatch on Wine + AMD + Mesa is GPU synchronization due to transfers from GPU texture to CPU texture. By using PBOs and glReadPixels with GL_PIXEL_PACK_BUFFER, we can avoid this synchronization leading to improved performance.
The attached patch is one approach to resolve the performance issue, but I am not sure that it is the best way. Feedback is appreciated.
Also attached is before and after screenshots with Mesa's HUD. You can observe that the buffer wait time decreases with the patched version. Both the before and after screenshots are on wine-staging master plus my patch for 45723.
https://bugs.winehq.org/show_bug.cgi?id=45901
--- Comment #1 from Andrew Wesie awesie@gmail.com --- Created attachment 62423 --> https://bugs.winehq.org/attachment.cgi?id=62423 Overwatch screenshot without patch
https://bugs.winehq.org/show_bug.cgi?id=45901
--- Comment #2 from Andrew Wesie awesie@gmail.com --- Created attachment 62424 --> https://bugs.winehq.org/attachment.cgi?id=62424 Overwatch screenshot with patch
https://bugs.winehq.org/show_bug.cgi?id=45901
--- Comment #3 from Henri Verbeet hverbeet@gmail.com --- What kind of resources and APIs are involved here? I guess a copy from a DEFAULT texture to a STAGING texture? What about format and CPUAccessFlags? I think we'd prefer to avoid ReadPixels() if we can. What is the issue with GetTexImage()? Just that it doesn't work well with PBOs? Or something else?
https://bugs.winehq.org/show_bug.cgi?id=45901
--- Comment #4 from Andrew Wesie awesie@gmail.com --- It looks like Overwatch has a 1D render target texture that it copies to a 1D staging texture. It has a rotation of 5 destination textures, likely to prevent blocking the GPU. In the excerpts below, I only include one destination texture.
Source texture init:
d3d11_device_CreateTexture1D iface 0xc1dc0, desc 0xc46e560, data (nil), texture 0x7f4fbfe07ce8 wined3d_texture_init texture 0x61ef5400, resource_type WINED3D_RTYPE_TEXTURE_1D, format WINED3DFMT_R32_FLOAT, multisample_type 0, multisample_quality 0, usage WINED3DUSAGE_RENDERTARGET | WINED3DUSAGE_TEXTURE, access WINED3D_RESOURCE_ACCESS_GPU, width 1024, height 1, depth 1, layer_count 1, level_count 1, flags 0, ...
Destination texture init:
d3d11_device_CreateTexture1D iface 0xc1dc0, desc 0xc46f150, data (nil), texture 0xaeb050 wined3d_texture_init texture 0x16f01ab0, resource_type WINED3D_RTYPE_TEXTURE_1D, format WINED3DFMT_R32_FLOAT, multisample_type 0, multisample_quality 0, usage 0, access WINED3D_RESOURCE_ACCESS_CPU | WINED3D_RESOURCE_ACCESS_MAP_R, width 1024, height 1, depth 1, layer_count 1, level_count 1, flags 0, ...
Copy:
d3d11_immediate_context_CopyResource iface 0xc1df0, dst_resource 0x16f01a40, src_resource 0x61ef6950 wined3d_device_copy_resource device 0xd9740, dst_resource 0x16f01ab0, src_resource 0x61ef5400 wined3d_cs_run Executing WINED3D_CS_OP_BLT_SUB_RESOURCE texture2d_blt dst_texture 0x16f01ab0, dst_sub_resource_idx 0, dst_box (0, 0, 0)-(1024, 1, 1), src_texture 0x61ef5400, src_sub_resource_idx 0, src_box (0, 0, 0)-(1024, 1, 1), flags 0x20000000, fx 0xc6d0494, filter WINED3D_TEXF_POINT
The reason for using glReadPixels (e.g. texture2d_read_from_framebuffer) in my patch is because of AMD + Mesa. Based on the Mesa source code and empirical testing, GetTexImage (implemented with GetTexSubImage) does not have any special support on AMD, unlike i965: ./mesa/drivers/dri/i965/intel_tex_image.c: functions->GetTexSubImage = intel_get_tex_sub_image;
whereas glReadPixels has support on AMD that prevents the GPU synchronization: ./mesa/drivers/dri/i965/intel_pixel.c: functions->ReadPixels = intelReadPixels; ./mesa/drivers/dri/i915/intel_pixel.c: functions->ReadPixels = intelReadPixels; ./mesa/drivers/dri/r200/r200_state.c: functions->ReadPixels = radeonReadPixels; ./mesa/drivers/dri/radeon/radeon_state.c: ctx->Driver.ReadPixels = radeonReadPixels;
The default implementation of GetTexSubImage will resort to mapping the PBO and doing a copy on the CPU side. The radeon implementation of ReadPixels does the expected thing of asking the GPU to copy the texture into the PBO.
https://bugs.winehq.org/show_bug.cgi?id=45901
Józef Kucia joseph.kucia@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |joseph.kucia@gmail.com Ever confirmed|0 |1 Status|UNCONFIRMED |NEW
--- Comment #5 from Józef Kucia joseph.kucia@gmail.com --- (In reply to Andrew Wesie from comment #4)
The reason for using glReadPixels (e.g. texture2d_read_from_framebuffer) in my patch is because of AMD + Mesa. Based on the Mesa source code and empirical testing, GetTexImage (implemented with GetTexSubImage) does not have any special support on AMD, unlike i965: ./mesa/drivers/dri/i965/intel_tex_image.c: functions->GetTexSubImage = intel_get_tex_sub_image;
whereas glReadPixels has support on AMD that prevents the GPU synchronization: ./mesa/drivers/dri/i965/intel_pixel.c: functions->ReadPixels = intelReadPixels; ./mesa/drivers/dri/i915/intel_pixel.c: functions->ReadPixels = intelReadPixels; ./mesa/drivers/dri/r200/r200_state.c: functions->ReadPixels = radeonReadPixels; ./mesa/drivers/dri/radeon/radeon_state.c: ctx->Driver.ReadPixels = radeonReadPixels;
I don't think that you are using the classic Radeon driver. You probably use one of the Gallium drivers, radeonsi most likely.
https://bugs.winehq.org/show_bug.cgi?id=45901
--- Comment #6 from Andrew Wesie awesie@gmail.com --- (In reply to Józef Kucia from comment #5)
I don't think that you are using the classic Radeon driver. You probably use one of the Gallium drivers, radeonsi most likely.
You are correct. Those were the wrong parts of Mesa to highlight.
In ./mesa/state_tracker/st_cb_texture.c:st_GetTexSubImage, which is the code path for Gallium drivers, it will blit into a tmp texture, map the PBO, map the tmp texture, and memcpy:
st->pipe->blit(st->pipe, &blit); pixels = _mesa_map_pbo_dest(ctx, &ctx->Pack, pixels); map = pipe_transfer_map_3d(pipe, dst, 0, PIPE_TRANSFER_READ, ...); ... memcpy(dest, slice_map, bytesPerRow); ... pipe_transfer_unmap(pipe, tex_xfer); _mesa_unmap_pbo_dest(ctx, &ctx->Pack);
In mesa/state_tracker/st_cb_readpixels.c:st_ReadPixels it has a fast path for PBOs:
if (st->pbo.download_enabled && _mesa_is_bufferobj(pack->BufferObj)) { if (try_pbo_readpixels(st, strb, st_fb_orientation(ctx->ReadBuffer) == Y_0_TOP, x, y, width, height, src_format, dst_format, pack, pixels)) return; }
The rationale for using glReadPixels is the lack of hardware support for GetTexSubImage in Mesa's Gallium drivers.
While try_pbo_readpixels looks very complex, to me at least, it avoids the very costly CPU-GPU synchronization.
https://bugs.winehq.org/show_bug.cgi?id=45901
--- Comment #7 from Józef Kucia joseph.kucia@gmail.com --- I think that one problem with your patch is that it doesn't solve the issue completely. It fixes the problem only when the source resource is a render target. Otherwise, we still use the slow GetTexImage() path.
Other possible ways to fix the issue: 1) Optimize GetTexImage() in Mesa. 2) Use OpenGL resources for staging resources in wined3d. I think it could avoid the issue completely. CopyResource() would be a normal raw blit between 2 GL textures. However, it isn't something that we should change lightly. It needs some consideration, and I'm not sure if it's a very good idea.
https://bugs.winehq.org/show_bug.cgi?id=45901
--- Comment #8 from Henri Verbeet hverbeet@gmail.com --- (In reply to Józef Kucia from comment #7)
I think that one problem with your patch is that it doesn't solve the issue completely. It fixes the problem only when the source resource is a render target. Otherwise, we still use the slow GetTexImage() path.
Other possible ways to fix the issue:
- Optimize GetTexImage() in Mesa.
I think we want this.
- Use OpenGL resources for staging resources in wined3d. I think it could
avoid the issue completely. CopyResource() would be a normal raw blit between 2 GL textures. However, it isn't something that we should change lightly. It needs some consideration, and I'm not sure if it's a very good idea.
Well, one issue with that approach is that you're moving the GPU stall to mapping the staging texture, since now the staging texture stays on the GPU until it's mapped.
I think using PBOs (instead of textures) to back staging resources may be ok, but you're right that that's a major change. I'm thinking we could then have texture2d_download_data() download directly into the destination PBO, similar to how wined3d_texture_upload_data() takes a wined3d_bo_address for its destination data. texture2d_blt() could then handle downloads the same way it handles uploads. That still potentially stalls on mapping the staging texture, but only if the copy hasn't finished yet by the time we try to map the PBO, which I think may be the expected behaviour.
https://bugs.winehq.org/show_bug.cgi?id=45901
--- Comment #9 from Andrew Wesie awesie@gmail.com --- Based on the feedback, maybe this should be split in to two patches.
The first patch adds fast path for downloading a GPU texture to a CPU texture using PBO (e.g. using texture2d_download_data but passing in a bo_address instead of dst_location).
The second patch adds a special case to texture2d_download_data to use glReadPixels for compatible textures (e.g. those that have a render target usage?). Unless we think that getting accelerated glGetTexImage in Mesa is possible.
I am curious about the support for accelerated glGetTexImage in AMD and NVIDIA proprietary drivers. I'd imagine it exists since performance is the main benefit of PBOs.
https://bugs.winehq.org/show_bug.cgi?id=45901
--- Comment #10 from Henri Verbeet hverbeet@gmail.com --- Yeah, I think that makes sense. We'll probably want creating PBOs for staging textures to be a separate patch as well.
I suspect it shouldn't be too hard to make Gallium GetTexImage() work with PBOs.
https://bugs.winehq.org/show_bug.cgi?id=45901
zzzzzyzz@hacari.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |zzzzzyzz@hacari.org
https://bugs.winehq.org/show_bug.cgi?id=45901
--- Comment #11 from Henri Verbeet hverbeet@gmail.com --- So with what's currently in Wine, this should be resolved everywhere except on the Gallium-based Mesa drivers, right? I'm tempted to just nominate the final patch in the series for staging. Not so much because there's anything wrong with the patch itself, but because it's essentially a workaround for a shortcoming in the driver.
If you could send a patch or bug report to Mesa, that would be great. If not, one of us can look into it; I think it should be reasonably straightforward.
https://bugs.winehq.org/show_bug.cgi?id=45901
--- Comment #12 from Andrew Wesie awesie@gmail.com --- I'll double check next week on NVIDIA. I think wine-staging is fine for the last patch.
https://bugs.winehq.org/show_bug.cgi?id=45901
--- Comment #13 from Andrew Wesie awesie@gmail.com --- I put together a patch based on readpixels and submitted it in a bug report to Mesa, along with a piglit test case.
https://bugs.freedesktop.org/show_bug.cgi?id=108263
https://bugs.winehq.org/show_bug.cgi?id=45901
mirh mirh@protonmail.ch changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |mirh@protonmail.ch
--- Comment #14 from mirh mirh@protonmail.ch --- There has been an absolute buttload of work on this front (discussion is hot in these very days)
https://gitlab.freedesktop.org/mesa/mesa/-/issues/1030 https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12096 https://gitlab.freedesktop.org/mesa/mesa/-/issues/4735 https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11984
Btw is this in any way related to bug 44315?
https://bugs.winehq.org/show_bug.cgi?id=45901
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |z.figura12@gmail.com Resolution|--- |NOTOURBUG
--- Comment #15 from Zeb Figura z.figura12@gmail.com --- (In reply to Henri Verbeet from comment #10)
I suspect it shouldn't be too hard to make Gallium GetTexImage() work with PBOs.
This has indeed been done by Pierre-Eric Pelloux-Prayer upstream: https://gitlab.freedesktop.org/mesa/mesa/-/commit/41e093fc98c269279a100c3dd25ed911a7eeec58
https://bugs.winehq.org/show_bug.cgi?id=45901
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #16 from Zeb Figura z.figura12@gmail.com --- (In reply to Zeb Figura from comment #15)
(In reply to Henri Verbeet from comment #10)
I suspect it shouldn't be too hard to make Gallium GetTexImage() work with PBOs.
This has indeed been done by Pierre-Eric Pelloux-Prayer upstream: https://gitlab.freedesktop.org/mesa/mesa/-/commit/ 41e093fc98c269279a100c3dd25ed911a7eeec58
And released as part of Mesa 21.3.0.