https://bugs.winehq.org/show_bug.cgi?id=50898
Bug ID: 50898 Summary: Improve performance for RGB lookups into color tables conversion Product: Wine Version: 6.5 Hardware: x86-64 OS: Linux Status: NEW Severity: enhancement Priority: P2 Component: gdi32 Assignee: wine-bugs@winehq.org Reporter: gabrielopcode@gmail.com Distribution: ---
Created attachment 69710 --> https://bugs.winehq.org/attachment.cgi?id=69710 Generate a lookup cache for RGB into color table conversion
This improves the performance of color conversion from RGB to palette significantly by generating a lookup cache for 5-bit-per-color RGB values (centered) into the palette, before doing the conversion, for older applications and games. The comments in the patch should describe the algorithm I came up with for large color tables.
Note that Windows also quantizes the lookup to 5-bit-per-color already, so it very likely does something similar (but seems to do more caching based on my tests); we have this quantization in wine's code to match Windows, but we don't take advantage of it, so it's wasted.
Generating it by looking through the color table for each entry in the lookup cache wouldn't be a good idea if the color table is large. The lookup cache itself is 32768 entries, which turns out to be the equivalent of a 256x128 image. The current algorithm used by wine works like that already: it goes through each pixel in the image, and for each pixel, it looks up every color in the color table, which is extremely slow. Generating the cache like that would mean images have to be at least 256x128 for it to have any benefit, otherwise it would be a performance regression instead, and that's not very reasonable to expect of older applications, and it wouldn't provide much.
So, the table generation algorithm is much more efficient than that for large amount of colors in the table. In my synthetic tests generating 4096 tables with various color tables of 256 colors, it turned out to be at least 4x faster, sometimes even 4.5x depending on color table (even a purely random color table), for example a random table taking about 14406ms instead of 62056ms on my CPU.
In fact, the algorithm is not far from being constant in terms of amount of colors in the table, taking only 50% more time from 16 colors to 256. So basically increasing the amount of colors from 16 to 256 doesn't make that much difference, since each color usually fills its own part of the lookup cache, which is not filled again (unless distance is shorter due to quantization, but that's only for one generation anyway). In contrast, going through each color in the table for each entry (the original algorithm) would massively increase the amount of time spent (from less than 10 seconds, up to more than a minute, or 6x slower).
For a few benchmark examples: a 256-color table with the algorithm takes about 14000ms to do 4096 iterations on my CPU, a 40-color table takes about 12000ms, so it's only 17% slower with this algorithm. The 40 colors seems to be the cutoff on my CPU where the algorithms start to differ in performance, with the straight one being faster at lower amount of colors, presumably due to better cache locality and branch prediction.
A fast and easy way to "see this in action" is to download Winamp, then launch it and right-click on the top, click on the "Nullsoft Winamp..." menu item and then go to the Credits tab. Depending on your CPU you can clearly see the FPS difference and/or CPU usage for one core. (Note that it still has some fps drops, this is normal and happens on Windows as well, even when the CPU core is not maxed out, it's inherent in the app)
http://web.archive.org/web/20180902190237/http://winamparchive.org/dl/winamp...
Further improvements would require caching it somehow between calls, but I believe that requires either more fragile changes to gdi32 code, or semi-hacks to deal with a typical applications' expectations, so this should be a good start for now.
https://bugs.winehq.org/show_bug.cgi?id=50898
Gabriel Ivăncescu gabrielopcode@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |patch, performance
https://bugs.winehq.org/show_bug.cgi?id=50898
winetest@luukku.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |winetest@luukku.com
--- Comment #1 from winetest@luukku.com --- maybe add keyword download since it can be tested.
https://bugs.winehq.org/show_bug.cgi?id=50898
Gabriel Ivăncescu gabrielopcode@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |download
https://bugs.winehq.org/show_bug.cgi?id=50898
--- Comment #2 from Gabriel Ivăncescu gabrielopcode@gmail.com --- So turns out the walk is slightly wrong and I'll revise it at some point, but for now I'm thinking of some alternatives that should still yield some nice gains in practice (and might be useful in conjunction also).
https://bugs.winehq.org/show_bug.cgi?id=50898
Gabriel Ivăncescu gabrielopcode@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Fixed by SHA1| |02a94ecc860e052b504928dc06e | |29aa9af93fd74 Status|NEW |RESOLVED
--- Comment #3 from Gabriel Ivăncescu gabrielopcode@gmail.com --- Should be vastly improved by 02a94ecc860e052b504928dc06e29aa9af93fd74, so I'm marking this as fixed.
https://bugs.winehq.org/show_bug.cgi?id=50898
Gabriel Ivăncescu gabrielopcode@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #69710|0 |1 is obsolete| |
--- Comment #4 from Gabriel Ivăncescu gabrielopcode@gmail.com --- Created attachment 69858 --> https://bugs.winehq.org/attachment.cgi?id=69858 Generate entire lookup table and globally cache it before doing the conversion.
However, since it's a performance bug, if someone wants to experiment with the other (fixed) algorithm (generating entire lookup cache via different algorithm and then globally caching it, before doing the conversion), I'm attaching those patches for records; should apply now on top of the current method (replacing it).
If these patches have vast improvements in some app compared to upstream wine, feel free to reopen or email me so I can try to investigate it.
https://bugs.winehq.org/show_bug.cgi?id=50898
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #5 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 6.7.