http://bugs.winehq.org/show_bug.cgi?id=32096
Bug #: 32096 Summary: drawStridedSlow can be more efficient Product: Wine Version: unspecified Platform: x86 OS/Version: Linux Status: UNCONFIRMED Severity: enhancement Priority: P2 Component: directx-d3d AssignedTo: wine-bugs@winehq.org ReportedBy: guillaum.bouchard@gmail.com Classification: Unclassified
Hi,
Recently I noticed that World of Warcaft is far more slow on d3d9 mode than using the OpenGL mode on my computer (pentium 4 and geforce 5 fx). Unfortunately the OpenGL mode is unusable since last WoW update (bug is inside WoW, not wine)
D3D mode is slower because an important amount of the primitives are rendered using drawStridedSlow (dlls/wined3d/drawprim.c) which generates a lot of opengl call (here, 600K calls compared to 12K in OpenGL mode).
It appears that if the computer can handle vertex array (ie: glDrawElements/glDrawArrays/gl[Vertex|Color|TexCoords|Fog]Pointer and glClientActiveTexture) the number of openGL call can be dramatically reduced, leading to an important increase in FPS (8 to 25 in important city in WoW).
On my computer, I hacked inside drawStritedSlow and I have only one issue, the format of color is packed in BGRA, which is not compatible with the format awaited by glColorPointer. This can be fixed using the extension ARB_vertex_array_BGRA, which is unfortunately not available on my computer. I solved this issue by looping over each vertex and building a custom array in memory that I submitted to glColorPointer. I got approximately the same FPS increase using this approach.
My conclusions here are that the fallback to drawStridedSlow is a bit extreme and in most case unnecessary. In my configuration everything works except the conversion from the d3d BGRA format for color in RGBA. This part can be easily done in software to create a suitable color array. Also, I did not dig inside the shader code of wined3d, but perhaps this conversion can be simply implemented as a swizzle in the custom vertex shader. (am I right guessing that wined3d does not uses the default shader of the fixed pipeline ?).
So this bug report is more a way of opening the discussion. I see many options to improve the situation:
a) Tweak the heuristics which detects and fallbacks to drawStridedSlow. If necessary implements the needed conversions in software. But this will not generates OpenGL function call and will increase performance a bit b) Tweak the vertex shader to do the conversion in the vertex shader, I think it is our best option c) Tweak drawStridedSlow to still rely on vertex array when possible (I have an example of code which is doing this. I'm playing WoW with this since 3 days without any issue and with a FPS increase of 200%). I don't think that honestly it is the good idea. d) do something else. Because I did not read anything else than the drawStritedSlow function, I may have missed something important.
I'm motivated to help in the DX -> GL code. I'm a skilled C and GL coder, I don't know DX, but can learn enough. Thank you for reading.
http://bugs.winehq.org/show_bug.cgi?id=32096
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |performance Version|unspecified |1.5.16
http://bugs.winehq.org/show_bug.cgi?id=32096
--- Comment #1 from guillaum.bouchard@gmail.com 2012-10-29 18:44:19 CDT --- Ok.
I dug a bit more inside the code, in dlls/wined3d/device.c, line 440, there is the slow_mask:
slow_mask |= -!gl_info->supported[ARB_VERTEX_ARRAY_BGRA] & ((1 << WINED3D_FFP_DIFFUSE) | (1 << WINED3D_FFP_SPECULAR));
Which is at true because my system does not support ARB_VERTEX_ARRAY_BGRA. And this is why the code fallbacks to immediate mode.
Instead, why not converting the color data to RGBA when drawing ? I'll try to do this tomorrow.
http://bugs.winehq.org/show_bug.cgi?id=32096
--- Comment #2 from Henri Verbeet hverbeet@gmail.com 2012-10-29 18:57:38 CDT --- (In reply to comment #0)
So this bug report is more a way of opening the discussion. I see many options to improve the situation:
a) Tweak the heuristics which detects and fallbacks to drawStridedSlow. If necessary implements the needed conversions in software. But this will not generates OpenGL function call and will increase performance a bit b) Tweak the vertex shader to do the conversion in the vertex shader, I think it is our best option c) Tweak drawStridedSlow to still rely on vertex array when possible (I have an example of code which is doing this. I'm playing WoW with this since 3 days without any issue and with a FPS increase of 200%). I don't think that honestly it is the good idea. d) do something else. Because I did not read anything else than the drawStritedSlow function, I may have missed something important.
Well, we have most of that, see e.g. wined3d_buffer_preload() and shader_glsl_swizzle_to_str(). You should probably figure out why that code doesn't get used for you. Personally though, I think it's probably more productive to work on fixing any issues in Mesa / Nouveau for your hardware, and just use those drivers instead.
http://bugs.winehq.org/show_bug.cgi?id=32096
--- Comment #3 from guillaum.bouchard@gmail.com 2012-10-30 04:45:18 CDT --- (In reply to comment #2)
Well, we have most of that, see e.g. wined3d_buffer_preload() and shader_glsl_swizzle_to_str().
Apparently wined3d_buffer_preload is only used for data inside *vertex buffer* (ie: VBO in GL, I don't know the d3d nomenclature).
This part of the WoW code appears to use *main memory buffer* (ie: vertex array in GL).
For shader_glsl_swizzle_to_str, I did not get where it is called and why. I'll try to understand that part of wined3d tonight.
You should probably figure out why that code doesn't get used for you.
I did a small test, using:
------------------------------- diff -r f1ba8a33d138 src/wine/dlls/wined3d/device.c --- a/src/wine/dlls/wined3d/device.c Tue Oct 30 00:15:15 2012 +0100 +++ b/src/wine/dlls/wined3d/device.c Tue Oct 30 08:25:10 2012 +0100 @@ -436,8 +436,10 @@ } else { + int bgra_supported = gl_info->supported[ARB_VERTEX_ARRAY_BGRA]; + bgra_supported = 1; WORD slow_mask = (1 << WINED3D_FFP_PSIZE); - slow_mask |= -!gl_info->supported[ARB_VERTEX_ARRAY_BGRA] + slow_mask |= -!bgra_supported & ((1 << WINED3D_FFP_DIFFUSE) | (1 << WINED3D_FFP_SPECULAR));
if ((stream_info->position_transformed || (stream_info->use_map & slow_mask)) && !stream_info->all_vbo) ------------------------------
(ie: I replaced the bgra_supported value, which is 0 on my computer, by one).
Automatically I got the speedup at the cost of swapped R and B component. The amazing part is that because the rest of the code is not aware of the fact that the implementation has ARB_VERTEX_ARRAY_BGRA, it does not try to use glColorPointer(GL_BGRA, ...) and hence does not crash the program.
By doing a more subtle change:
Always enable ARB_VERTEX_ARRAY_BGRA: ------------------------------------------------------ diff -r 69f3e069092b src/wine/dlls/wined3d/directx.c --- a/src/wine/dlls/wined3d/directx.c Tue Oct 30 08:31:01 2012 +0100 +++ b/src/wine/dlls/wined3d/directx.c Tue Oct 30 08:37:39 2012 +0100 @@ -2737,6 +2737,7 @@ TRACE(" IMPLIED: ARB_vertex_array_bgra support (by EXT_vertex_array_bgra).\n"); gl_info->supported[ARB_VERTEX_ARRAY_BGRA] = TRUE; } + gl_info->supported[ARB_VERTEX_ARRAY_BGRA] = TRUE; if (!gl_info->supported[ARB_TEXTURE_COMPRESSION_RGTC] && gl_info->supported[EXT_TEXTURE_COMPRESSION_RGTC]) { TRACE(" IMPLIED: ARB_texture_compression_rgtc support (by EXT_texture_compression_rgtc).\n"); -------------------------------------------------------
Change the call to glColorPointer:
---------------------------------------------------------- diff -r 69f3e069092b src/wine/dlls/wined3d/utils.c --- a/src/wine/dlls/wined3d/utils.c Tue Oct 30 08:31:01 2012 +0100 +++ b/src/wine/dlls/wined3d/utils.c Tue Oct 30 08:37:39 2012 +0100 @@ -1601,8 +1601,9 @@
if (gl_info->supported[ARB_VERTEX_ARRAY_BGRA]) { + printf("conversion to GL_BGRA, forced to 4\n"); idx = getFmtIdx(WINED3DFMT_B8G8R8A8_UNORM); - gl_info->formats[idx].gl_vtx_format = GL_BGRA; + gl_info->formats[idx].gl_vtx_format = 4; }
if (gl_info->supported[ARB_HALF_FLOAT_VERTEX]) -----------------------------------------------------------
Works the same, with the swap between R and B component. I'm surprised, because the debug (printf("conversion to GL_BGRA, forced to 4\n") is only called a few time. Does that mean that this part of the code is only called when declaring *vertex stream* (ie: VAO ?).
I propose the following changes:
a) in device.c, removes the test which fallbacks to drawPrimitiveSlow if ARB_VERTEX_ARRAY_BGRA is not available b) somewhere just before the glColorPointer call, convert the array with a simple loop
I have a prototype of this code running, I still need to find an elegant solution for the place to store the data because I think I cannot do the conversion in place. More later, I need to go to work.
Personally though, I think it's probably more productive to work on fixing any issues in Mesa / Nouveau for your hardware, and just use those drivers instead.
I'm using the Nvidia driver. Your answer makes me give the nouveau driver a chance, but it segfault when launching WoW. (Amazingly enough, it have the ARB_vertex_array_bgra extension available. Also blender starts to work with it, which was not the case last time I tried, so it may be worth looking)
Thank you for your comments
http://bugs.winehq.org/show_bug.cgi?id=32096
--- Comment #4 from Henri Verbeet hverbeet@gmail.com 2012-10-30 05:25:42 CDT --- (In reply to comment #3)
Apparently wined3d_buffer_preload is only used for data inside *vertex buffer* (ie: VBO in GL, I don't know the d3d nomenclature).
This part of the WoW code appears to use *main memory buffer* (ie: vertex array in GL).
Are you saying it uses user pointer draws through wined3d_device_draw_primitive_up()?
http://bugs.winehq.org/show_bug.cgi?id=32096
--- Comment #5 from Matteo Bruni matteo.mystral@gmail.com 2012-10-30 05:41:32 CDT --- Which GPU and Nvidia driver version are you using? GL_ARB_vertex_array_bgra is reported on my GTX 470 with current drivers (and I think it has been since a long time).
This part of the WoW code appears to use *main memory buffer* (ie: vertex array in GL).
I never debugged WoW so I don't know, but is it really going through Draw[Indexed]PrimitiveUP?
By doing a more subtle change:
Always enable ARB_VERTEX_ARRAY_BGRA:
diff -r 69f3e069092b src/wine/dlls/wined3d/directx.c --- a/src/wine/dlls/wined3d/directx.c Tue Oct 30 08:31:01 2012 +0100 +++ b/src/wine/dlls/wined3d/directx.c Tue Oct 30 08:37:39 2012 +0100 @@ -2737,6 +2737,7 @@ TRACE(" IMPLIED: ARB_vertex_array_bgra support (by EXT_vertex_array_bgra).\n"); gl_info->supported[ARB_VERTEX_ARRAY_BGRA] = TRUE; }
- gl_info->supported[ARB_VERTEX_ARRAY_BGRA] = TRUE; if (!gl_info->supported[ARB_TEXTURE_COMPRESSION_RGTC] &&
gl_info->supported[EXT_TEXTURE_COMPRESSION_RGTC]) { TRACE(" IMPLIED: ARB_texture_compression_rgtc support (by EXT_texture_compression_rgtc).\n");
Change the call to glColorPointer:
diff -r 69f3e069092b src/wine/dlls/wined3d/utils.c --- a/src/wine/dlls/wined3d/utils.c Tue Oct 30 08:31:01 2012 +0100 +++ b/src/wine/dlls/wined3d/utils.c Tue Oct 30 08:37:39 2012 +0100 @@ -1601,8 +1601,9 @@
if (gl_info->supported[ARB_VERTEX_ARRAY_BGRA]) {
printf("conversion to GL_BGRA, forced to 4\n"); idx = getFmtIdx(WINED3DFMT_B8G8R8A8_UNORM);
gl_info->formats[idx].gl_vtx_format = GL_BGRA;
gl_info->formats[idx].gl_vtx_format = 4;
}
if (gl_info->supported[ARB_HALF_FLOAT_VERTEX])
Works the same, with the swap between R and B component. I'm surprised, because the debug (printf("conversion to GL_BGRA, forced to 4\n") is only called a few time. Does that mean that this part of the code is only called when declaring *vertex stream* (ie: VAO ?).
Wine does not currently use VAOs (I guess caching vertex streams configurations in VAOs may be a performance optimization, but that's unrelated anyway). That piece of code is in apply_format_fixups(), which is called only on wined3d initialization.
I propose the following changes:
a) in device.c, removes the test which fallbacks to drawPrimitiveSlow if ARB_VERTEX_ARRAY_BGRA is not available b) somewhere just before the glColorPointer call, convert the array with a simple loop
I'm not saying that the fallback can't be improved, but I'd not expect it to be useful all that much. I mean, I thought GL_ARB_vertex_array_bgra support would be widespread.
http://bugs.winehq.org/show_bug.cgi?id=32096
--- Comment #6 from Henri Verbeet hverbeet@gmail.com 2012-10-30 06:04:34 CDT --- (In reply to comment #5)
useful all that much. I mean, I thought GL_ARB_vertex_array_bgra support would be widespread.
It is, the issue is probably mostly that the GeForce FX drivers don't see a lot of development from nvidia anymore. This doesn't just affect ARB_vertex_array_bgra, but also things like sRGB texture decode and FBO blits in some cases. Perhaps it's worth a look if there's a newer "legacy" driver from nvidia, but afaik those mostly just fix compatibility with newer X servers, and don't add support for new extensions.
http://bugs.winehq.org/show_bug.cgi?id=32096
--- Comment #7 from guillaum.bouchard@gmail.com 2012-10-30 08:11:21 CDT --- (In reply to comment #5)
Which GPU and Nvidia driver version are you using? GL_ARB_vertex_array_bgra is reported on my GTX 470 with current drivers (and I think it has been since a long time).
GeForce 5 FX with latest driver from Vvidia 173.14.36
I never debugged WoW so I don't know, but is it really going through Draw[Indexed]PrimitiveUP?
Yes, for everything in the GUI and some really small objects (mainly the floor and spell effects). I guess they used *UP* methods for everything quickly moving or too small to be worth a storage in a VRAM buffer.
Wine does not currently use VAOs (I guess caching vertex streams configurations in VAOs may be a performance optimization, but that's unrelated anyway). That piece of code is in apply_format_fixups(), which is called only on wined3d initialization.
Yes, I finally figured it. I called this VAO because I try to put concept I know (ie: OpenGL) on stuff that I don't know (D3D and the wine api)
I'm not saying that the fallback can't be improved, but I'd not expect it to be useful all that much. I mean, I thought GL_ARB_vertex_array_bgra support would be widespread.
Perhaps I'm just annoying you with a really old GPU. If this is the case, I'm ok to just keep my work as a private patch and apply it to wine for my own purpose, but who know, it may helps others. I'll try to submit a clean enough patch today or tomorrow.
http://bugs.winehq.org/show_bug.cgi?id=32096
--- Comment #8 from Henri Verbeet hverbeet@gmail.com 2012-10-30 08:30:10 CDT --- (In reply to comment #7)
Yes, for everything in the GUI and some really small objects (mainly the floor and spell effects). I guess they used *UP* methods for everything quickly moving or too small to be worth a storage in a VRAM buffer.
Yeah, that's a bit of a known issue. We actually want user pointer draws to go through the vertex buffer code as well. It shouldn't be very hard to make that work, although it's not entirely trivial either.
https://bugs.winehq.org/show_bug.cgi?id=32096
--- Comment #9 from Austin English austinenglish@gmail.com --- Is this still an issue in current (1.7.35 or newer) wine? If so, please attach terminal output.
https://bugs.winehq.org/show_bug.cgi?id=32096
joaopa jeremielapuree@yahoo.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jeremielapuree@yahoo.fr
--- Comment #10 from joaopa jeremielapuree@yahoo.fr --- Does the bug still occur with wine-4.1?
https://bugs.winehq.org/show_bug.cgi?id=32096
guillaum.bouchard@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
--- Comment #11 from guillaum.bouchard@gmail.com --- (In reply to joaopa from comment #10)
Does the bug still occur with wine-4.1?
I don't have the hardware anymore. I think that we can close this bug, we are talking about an issue with a game of GPU which is 15 years old and about a fix for the lake of an OpenGL extension which is mainstream since 13 years.
https://bugs.winehq.org/show_bug.cgi?id=32096
guillaum.bouchard@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|FIXED |ABANDONED
https://bugs.winehq.org/show_bug.cgi?id=32096
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #12 from Austin English austinenglish@gmail.com --- Closing.