https://bugs.winehq.org/show_bug.cgi?id=44863
Bug ID: 44863 Summary: Performance regression in Prince of Persia 3D Product: Wine Version: unspecified Hardware: x86 OS: Linux Status: NEW Severity: normal Priority: P2 Component: directx-d3d Assignee: wine-bugs@winehq.org Reporter: stefan@codeweavers.com Distribution: ---
Prince of Persia 3D's performance went from perfectly smooth to about 0.5 fps. I suspect 0b92a6fba7a6e60c6ff1a3729a3b21019c2df0ce is to blame, but I have not run a regression test yet.
The problem is that the game creates a rather large (2MB) D3DVBCAPS_SYSTEMMEMORY, maps it (the entire buffer due to API limitations), writes a handful of vertices and draws a handful of vertices. Currently wined3d uploads the entire 2MB, evicts the sysmem copy and downloads it from the GPU every map / unmap / draw cycle.
The most obvious performance fix is not to create a VBO. Doing this restores the performance, but questions remain.
On startup, the game writes "NetImmerse D3DDriver Info: Hardware supports system memory textures" and "NetImmerse D3DDriver Info: No AGP support detected". The first info seems wrong, so it is possible that the game enters a codepath it does not choose on Windows.
Not creating a VBO is not an option on Core Contexts, so I investigated what's going wrong with the PBO codepath on. First of all, evicting the sysmem copy seems like a bad choice. It happens because ddraw buffers are not marked dynamic. We may want to chance this. The game uses d3d3, so there's no DDLOCK_DISCARDCONTENTS. The game passes DDLOCK_WAIT | DDLOCK_WRITEONLY to IDirect3DVertexBuffer::Lock.
Commenting out the eviction call improves performance quite a bit, but it is still noticeably slow. wined3d_buffer_map maps through heap_memory instead of glMapBuffer because of the "(flags & WINED3D_MAP_WRITE) && !(flags & (WINED3D_MAP_NOOVERWRITE | WINED3D_MAP_DISCARD))" condition.
Removing this condition uses glMapBuffer, but does not improve performance. It seems the large glMapBuffer is still slow, at least on OSX with legacy contexts.
So there are a few questions that need to be answered: *) Is the game using a broken codepath? *) Write tests for sysmem buffers *) Consider making all d3d3 buffers dynamic *) Test if the glMapBuffer path is fast on Linux *) Investigate if Core Contexts + GL_ARB_buffer_storage help on OSX.
https://bugs.winehq.org/show_bug.cgi?id=44863
Stefan Dösinger stefan@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |regression
https://bugs.winehq.org/show_bug.cgi?id=44863
joaopa jeremielapuree@yahoo.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jeremielapuree@yahoo.fr
--- Comment #1 from joaopa jeremielapuree@yahoo.fr --- Bug happens with the demo https://www.fileplanet.com/22571/download/Prince-of-Persia-Demo
Can an administrator put the link at the right place?
https://bugs.winehq.org/show_bug.cgi?id=44863
Stefan Dösinger stefan@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- URL| |https://www.fileplanet.com/ | |22571/download/Prince-of-Pe | |rsia-Demo Keywords| |download
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #2 from Matteo Bruni matteo.mystral@gmail.com --- (In reply to Stefan Dösinger from comment #0)
Just a couple of comments...
*) Investigate if Core Contexts + GL_ARB_buffer_storage help on OSX.
I don't think ARB_buffer_storage is a thing on macOS.
FWIW, I agree it probably makes sense to mark all buffers as DYNAMIC on old d3d.
https://bugs.winehq.org/show_bug.cgi?id=44863
tokktokk fdsfgs@krutt.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |fdsfgs@krutt.org
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #3 from joaopa jeremielapuree@yahoo.fr --- Unfortunately, bug is still there in current wine(3.20)
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #4 from joaopa jeremielapuree@yahoo.fr --- Bug still occurs with wine-4.9.
https://bugs.winehq.org/show_bug.cgi?id=44863
pattietreutel katyaberezyaka@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |katyaberezyaka@gmail.com
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #5 from joaopa jeremielapuree@yahoo.fr --- Bug still occurs with wine-4.17.
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #6 from joaopa jeremielapuree@yahoo.fr --- Bug still occurs with wine-5.18
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #7 from joaopa jeremielapuree@yahoo.fr --- Bugs still occurs with wine-7.0-rc5.
https://bugs.winehq.org/show_bug.cgi?id=44863
Alex Henrie alexhenrie24@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexhenrie24@gmail.com See Also| |https://bugs.winehq.org/sho | |w_bug.cgi?id=42588, | |https://bugs.winehq.org/sho | |w_bug.cgi?id=43246 Version|unspecified |2.1 Regression SHA1| |15d53761a5fbfc12fc5f9974c02 | |9dace00eab33d
--- Comment #8 from Alex Henrie alexhenrie24@gmail.com --- I can confirm. `git bisect` says:
15d53761a5fbfc12fc5f9974c029dace00eab33d is the first bad commit commit 15d53761a5fbfc12fc5f9974c029dace00eab33d Author: Henri Verbeet hverbeet@codeweavers.com Date: Tue Jan 31 15:47:10 2017 +0100
wined3d: Do not pin system memory in wined3d_buffer_load_location().
Signed-off-by: Henri Verbeet hverbeet@codeweavers.com Signed-off-by: Alexandre Julliard julliard@winehq.org
dlls/wined3d/buffer.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #9 from Alex Henrie alexhenrie24@gmail.com --- I should probably mention that I did the regression test with an AMD Renoir integrated GPU on Ubuntu 20.04. The difference in performance before and after commit 15d53761a5fb was pretty dramatic.
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #10 from Henri Verbeet hverbeet@gmail.com --- The way this works in current ddraw/wined3d is quite different, so I suspect not much of the original analysis still applies.
Is the application using d3d_device3_DrawIndexedPrimitiveVB() with that particular vertex buffer? We currently pass "vb->size / stride" as "vertex_count" to d3d_device7_DrawIndexedPrimitiveVB(). Perhaps it would be advantageous to determine actual upper and lower vertex indices when "index_count" is significantly smaller than the number of vertices contained in the vertex buffer.
https://bugs.winehq.org/show_bug.cgi?id=44863
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com
--- Comment #11 from Zeb Figura z.figura12@gmail.com --- I don't know why I'm seeing something different than Stefan, but all of the vertex buffers I'm seeing have D3DVBCAPS_WRITEONLY set, and nothing else. So that can't be the problem.
Forcing locks to be NOOVERWRITE actually doesn't break rendering, but it doesn't make it any faster either. Not sure what the bottleneck actually is yet...
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #12 from Zeb Figura z.figura12@gmail.com --- Created attachment 73868 --> https://bugs.winehq.org/attachment.cgi?id=73868 concatenated patches
The game juggles two vertex buffers, both 2 MB (65536 vertices). Each frame it repeatedly maps one vertex buffer and then draws with DrawIndexedPrimitiveVB, alternating which vertex buffer it uses. The draws themselves are small and only ever reference the first few vertices from the buffer, and there are a lot of them.
The attached two patches improve performance for me; the first from ~10 fps to ~25; the second up to 60. Apparently WINED3D_ACCESS_GPU | MAP_R | MAP_W without DYNAMIC is *really* bad for this usage pattern, even worse than uploading the whole buffer every frame. Which isn't exactly surprising.
I'd say this is questionable and needs tests, but honestly I'm not sure under what circumstance GPU | MAP_R | MAP_W is going to perform *better* than this?
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #13 from Stefan Dösinger stefan@codeweavers.com --- What happens if you make the buffer dynamic and keep it as a hardware buffer?
I'll try to find some time to look into this myself, check if I still see the SYSMEM cap flag, and compare the game's log output to that on Windows...
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #14 from Zeb Figura z.figura12@gmail.com --- (In reply to Stefan Dösinger from comment #13)
What happens if you make the buffer dynamic and keep it as a hardware buffer?
Performance is pretty abysmal. Not as bad as upstream Wine, but still pretty bad. Removing MAP_R improves it a bit more, but not enough. The fact the game isn't actually streaming (and given the pattern I don't think we can hack in NOOVERWRITE either) implies that going further down this route is not going to help.
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #15 from Henri Verbeet hverbeet@gmail.com --- https://gitlab.winehq.org/wine/wine/-/merge_requests/2404 is intended to address this.
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #16 from Stefan Dösinger stefan@codeweavers.com --- After some discussion about the MR Henri linked to I experimented a bit more with this game and my suspected capability flag issues.
I managed to run the game on Windows 11 with Windows' own ddraw. It doesn't set up the screen correctly (only a part of the game is displayed), but I got a new game started. It is somewhat sluggish, I would guess 20-30 fps, but a lot better than what Wine currently gets.
While reading over the caps code I came across D3DDEVICEDESC::dwMaxVertexCount, which we currently set to 65536 (since ddraw is limited to 16 bit index buffers). Lowering this value to 1024, which my Radeon Windows 11 driver sets, makes the game use smaller (1024 vertices / 32kb) vertex buffers and fixes the performance issues in this particular game. It doesn't answer other questions MR 2404 raises though.
Regarding the OutputDebugString diagnostics written by the game's 3D Engine:
"NetImmerse D3DDriver Info: Hardware supports system memory textures" is an effect of D3DDEVCAPS_TEXTURESYSTEMMEMORY, which we set and my windows driver does not. "NetImmerse D3DDriver Info: No AGP support detected" is printed because we do not set DDCAPS2_NONLOCALVIDMEM. With this flag set the output switches to "NetImmerse D3DDriver Info: 2x (Execute) AGP support detected", same as on Windows. Neither flag changes the game behavior and its handling of vertex data.
https://bugs.winehq.org/show_bug.cgi?id=44863
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Fixed by SHA1| |4d289c0cb833384a07e85976d98 | |d30e52b3686bd Resolution|--- |FIXED
--- Comment #17 from Zeb Figura z.figura12@gmail.com --- Should finally be fixed by https://source.winehq.org/git/wine.git/commitdiff/4d289c0cb833384a07e85976d98d30e52b3686bd.
https://bugs.winehq.org/show_bug.cgi?id=44863
--- Comment #18 from Alex Henrie alexhenrie24@gmail.com --- Well done Zeb, thank you!
https://bugs.winehq.org/show_bug.cgi?id=44863
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #19 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 9.9.