By not using floorf from external library ucrtbase.dll the performance of GdipDrawImagePointsRect was improved.
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=53947
After applying patch the game performance is similar to native gdiplus.dll.
Result of oprofile before patch: ``` samples % image name app name symbol name 895705 33.4761 ucrtbase.dll.so wine floor 351683 13.1438 no-vmlinux wine /no-vmlinux 330995 12.3706 gdiplus.dll.so wine resample_bitmap_pixel 300093 11.2157 win32u.so wine blend_rects_8888 272645 10.1898 gdiplus.dll.so wine GdipDrawImagePointsRect 137421 5.1360 gdiplus.dll.so wine sample_bitmap_pixel 112476 4.2037 gdiplus.dll.so wine convert_32bppARGB_to_32bppPARGB 35455 1.3251 no-vmlinux wineserver /no-vmlinux 35223 1.3164 no-vmlinux wine64-preloader /no-vmlinux 30252 1.1306 libc.so.6 wine /usr/lib/i386-linux-gnu/libc.so.6 ``` and after patch: ``` samples % image name app name symbol name 248581 18.3279 no-vmlinux wine /no-vmlinux 200141 14.7564 gdiplus.dll.so wine resample_bitmap_pixel 199840 14.7342 win32u.so wine blend_rects_8888 179135 13.2076 gdiplus.dll.so wine GdipDrawImagePointsRect 105785 7.7995 no-vmlinux wine-preloader /no-vmlinux 89534 6.6013 gdiplus.dll.so wine sample_bitmap_pixel 83947 6.1894 no-vmlinux wine64-preloader /no-vmlinux 74132 5.4658 gdiplus.dll.so wine convert_32bppARGB_to_32bppPARGB 22978 1.6942 anon (tgid:374066 range:0x7bc10000-0x7bffffff) wine anon (tgid:374066 range:0x7bc10000-0x7bffffff) 19747 1.4559 libc.so.6 wine /usr/lib/i386-linux-gnu/libc.so.6 14570 1.0742 win32u.so wine solid_rects_32 ```
Can we stop using floorf and ceilf from `math.h`?
-- v2: gdiplus: Boost performance of GdipDrawImagePointsRect.
From: Bartosz Kosiorek gang65@poczta.onet.pl
By not using floorf from external library ucrtbase.dll the performance of GdipDrawImagePointsRect was improved.
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=53947 --- dlls/gdiplus/graphics.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/dlls/gdiplus/graphics.c b/dlls/gdiplus/graphics.c index 371629a5bef..c790633b71d 100644 --- a/dlls/gdiplus/graphics.c +++ b/dlls/gdiplus/graphics.c @@ -1055,22 +1055,38 @@ static ARGB resample_bitmap_pixel(GDIPCONST GpRect *src_rect, LPBYTE bits, UINT } case InterpolationModeNearestNeighbor: { - FLOAT pixel_offset; + /* Using floorf from ucrtbase.dll is extremaly slow compared to own implementation (casting to INT) */ + INT pixel_with_offset_x, pixel_with_offset_y; switch (offset_mode) { default: case PixelOffsetModeNone: case PixelOffsetModeHighSpeed: - pixel_offset = 0.5; - break; + if (point->X >= 0.0f) + pixel_with_offset_x = (INT)(point->X + 0.5f); + else + pixel_with_offset_x = floorf(point->X + 0.5f);
+ if (point->Y >= 0.0f) + pixel_with_offset_y = (INT)(point->Y + 0.5f); + else + pixel_with_offset_y = floorf(point->Y + 0.5f); + break; case PixelOffsetModeHalf: case PixelOffsetModeHighQuality: - pixel_offset = 0.0; + if (point->X >= 0.0f) + pixel_with_offset_x = (INT)(point->X); + else + pixel_with_offset_x = floorf(point->X); + + if (point->Y >= 0.0f) + pixel_with_offset_y = (INT)(point->Y); + else + pixel_with_offset_y = floorf(point->Y); break; } return sample_bitmap_pixel(src_rect, bits, width, height, - floorf(point->X + pixel_offset), floorf(point->Y + pixel_offset), attributes); + pixel_with_offset_x, pixel_with_offset_y, attributes); }
}
Nikolay Sivov (@nsivov) commented about dlls/gdiplus/graphics.c:
} case InterpolationModeNearestNeighbor: {
FLOAT pixel_offset;
/* Using floorf from ucrtbase.dll is extremaly slow compared to own implementation (casting to INT) */
I meant, does resample_bitmap_pixel() need to use GpPointF input at all.
On Tue Nov 14 22:01:01 2023 +0000, Fabian Maurer wrote:
It looks like floor does a lot more, for example handling all the corner cases. Are you sure this isn't a problem here? FWIW, just having the function inlined already helps a lot. Btw, is there a reason you changed the switch/case identation? IMHO it makes reading the patch harder.
Can you provide fix with inlining? I would like to check if it is fixing the original bug report.
On Tue Nov 14 21:27:29 2023 +0000, Nikolay Sivov wrote:
I meant, does resample_bitmap_pixel() need to use GpPointF input at all.
I think yes. According to documentation:
``` Consider the pixel in the upper-left corner of an image with address (0, 0). With PixelOffsetModeNone, the pixel covers the area between –0.5 and 0.5 in both the x and y directions; that is, the pixel center is at (0, 0). With PixelOffsetModeHalf, the pixel covers the area between 0 and 1 in both the x and y directions; that is, the pixel center is at (0.5, 0.5). ``` https://learn.microsoft.com/en-us/windows/win32/api/gdiplusenums/ne-gdipluse...
On Tue Nov 14 22:05:53 2023 +0000, Bartosz Kosiorek wrote:
Can you provide fix with inlining? I would like to check if it is improving performance of the original bug report.
Sure attaching it here: [0001-test.patch](/uploads/6bf7ffd561bd6bbbc1e72966e5255b98/0001-test.patch)
I measured from starting wine until the picture in the game begins to scroll up 1) this patch: ~15sec 2) master: 47 sec 3) your patch: 47 sec
For me your patch doesn't help at all. Compiled from source with -O2.
On Tue Nov 14 22:48:08 2023 +0000, Fabian Maurer wrote:
Sure attaching it here: [0001-test.patch](/uploads/6bf7ffd561bd6bbbc1e72966e5255b98/0001-test.patch) I measured from starting wine until the picture in the game begins to scroll up
- this patch: ~15sec
- master: 47 sec
- your patch: 47 sec
For me your patch doesn't help at all. Compiled from source with -O2.
Okay, I didn't notice you secretly updated your patch to use floorf again. This completely negates the performance gains.
FWIW, I created bug 55899 about a performance regression where the stack isn't aligned. Maybe this causes the issues with floor? Because having both your/mine patch for this issue and the one for bug 55899 doesn't make it any faster, only one is enough. But I'd lie if I said I know what#s going on here, I'm pretty confused.