-- v4: gdiplus: move pointer calculation outside inner loop to improve performance
From: Bartosz Kosiorek gang65@poczta.onet.pl
--- dlls/gdiplus/graphics.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/dlls/gdiplus/graphics.c b/dlls/gdiplus/graphics.c index e3b5661fd67..f167e88fa46 100644 --- a/dlls/gdiplus/graphics.c +++ b/dlls/gdiplus/graphics.c @@ -422,11 +422,10 @@ static GpStatus alpha_blend_bmp_pixels(GpGraphics *graphics, INT dst_x, INT dst_
for (y=0; y<src_height; y++) { + ARGB *src_row = (ARGB*)(src + src_stride * y); for (x=0; x<src_width; x++) { - ARGB dst_color, src_color; - src_color = ((ARGB*)(src + src_stride * y))[x]; - + ARGB src_color = src_row[x]; if (comp_mode == CompositingModeSourceCopy) { if (!(src_color & 0xff000000)) @@ -436,6 +435,7 @@ static GpStatus alpha_blend_bmp_pixels(GpGraphics *graphics, INT dst_x, INT dst_ } else { + ARGB dst_color; if (!(src_color & 0xff000000)) continue;
Jeffrey Smith (@whydoubt) commented about dlls/gdiplus/graphics.c:
GdipBitmapSetPixel(dst_bitmap, x+dst_x, y+dst_y, color_over_fgpremult(dst_color, src_color)); else GdipBitmapSetPixel(dst_bitmap, x+dst_x, y+dst_y, color_over(dst_color, src_color)); }
While this goes beyond your original scope, I think it may be worth cleaning up the inner loop more extensively while you're at it.
```suggestion:-19+0 ARGB dst_color, src_color = src_row[x];
if (comp_mode == CompositingModeSourceCopy) dst_color = (!(src_color & 0xff000000)) ? 0 : src_color; else if (!(src_color & 0xff000000)) continue; else { GdipBitmapGetPixel(dst_bitmap, x+dst_x, y+dst_y, &dst_color); if (fmt & PixelFormatPAlpha) dst_color = color_over_fgpremult(dst_color, src_color); else dst_color = color_over(dst_color, src_color); }
GdipBitmapSetPixel(dst_bitmap, x+dst_x, y+dst_y, dst_color); ```
On Fri Aug 11 09:16:39 2023 +0000, Nikolay Sivov wrote:
I asked because usually claiming something is faster also needs some performance numbers.
Performance claims do hint at performance testing results. Maybe rewording the commit message to avoid direct mention of performance would be a bit less loaded:
`gdiplus: Move pointer calculation to reduce multiplications in alpha_blend_bmp_pixels.`
On Fri Aug 11 18:27:06 2023 +0000, Jeffrey Smith wrote:
Performance claims do hint at performance testing results. Maybe rewording the commit message to avoid direct mention of performance would be a bit less loaded: `gdiplus: Move pointer calculation to reduce multiplications in alpha_blend_bmp_pixels.`
I think this change can be made without performance testing.
That said, repeatedly calling GdipBitmapGetPixel and GdipBitmapSetPixel is never going to be efficient. This should probably do something like convert_pixels in image.c, including special cases for common format combinations.
This merge request was approved by Esme Povirk.
On Fri Aug 11 20:14:26 2023 +0000, Esme Povirk wrote:
I think this change can be made without performance testing. That said, repeatedly calling GdipBitmapGetPixel and GdipBitmapSetPixel is never going to be efficient. This should probably do something like convert_pixels in image.c, including special cases for common format combinations.
I will create separate PR for that.
On Sat Aug 12 14:32:10 2023 +0000, Bartosz Kosiorek wrote:
I will create separate PR for that.
I expect that any decent compiler is going to move the calculation out of the inner loop, so this won't make any change in performance. Doing micro-optimizations at the C level is usually a waste of time.
On Mon Aug 14 15:03:01 2023 +0000, Alexandre Julliard wrote:
I expect that any decent compiler is going to move the calculation out of the inner loop, so this won't make any change in performance. Doing micro-optimizations at the C level is usually a waste of time.
Running a mocked-up test on godbolt shows: - clang results in the same assembly with or without the change. - gcc also moves the calculation out of the inner loop (though the changed code does result in slightly shorter assembly).
So yeah, there is no meaningful gain here.
On Mon Aug 14 16:34:02 2023 +0000, Jeffrey Smith wrote:
Running a mocked-up test on godbolt shows:
- clang results in the same assembly with or without the change.
- gcc also moves the calculation out of the inner loop (though the
changed code does result in slightly shorter assembly). So yeah, there is no meaningful gain here.
Thanks. Closing it then
This merge request was closed by Bartosz Kosiorek.