On Fri Dec 2 17:42:17 2022 +0000, Bartosz Kosiorek wrote:
Generally the best results I have with inlining, loop enrolling and scale optimization. Lowest values of running Times on Linux 64 bit (I run about 10 times and get lowest value): Wine gdiplus.dll **without** optimizations:
- 500 x `GdipScaleMatrix` time (seconds): 0.31s
- 700 x `GdipMultiplyMatrix` time (seconds): 0.20s
Wine gdiplus.dll **without** optimizations with **inlining** `matrix_multiply` (Scaling is also using `matrix_multiply`):
- 500 x `GdipScaleMatrix` time (seconds): 0.21s
- 700 x `GdipMultiplyMatrix` time (seconds): 0.16s
Wine gdiplus.dll **with** optimizations and **inlining** (current Merge Request):
- 500 x `GdipScaleMatrix` time (seconds): 0.07s
- 700 x `GdipMultiplyMatrix` time (seconds): 0.11s
I don't think that the loop makes things any clearer for `matrix_multiply` regardless, nor can I think of any reasonable way that the extra integer math and branches would improve performance, so I think that part is fine regardless.
For GdipScaleMatrix, it's a little less convincing, because the new version of the code is less clear. I'm curious what happens if you apply only the second patch in the series. I suspect that constant propagation combined with the second patch may make the first patch redundant.