On Fri Dec 2 16:13:04 2022 +0000, Bartosz Kosiorek wrote:
I will check that. I think inlining should be applied anyway. Thanks for suggesting. Still with enrolling I don't need to calculate the modulo every loop and array indexes are hardcoded (no need to calculate it). Unfortunately I am not expert with checking compiler output. Could you please give some advice regarding that?
Generally the best results I have with inlining, loop enrolling and scale optimization.
Lowest values of running Times on Linux 64 bit (I run about 10 times and get lowest value):
Wine gdiplus.dll **without** optimizations: * 500 Matrix Scaling time (seconds): 0.31s * 700 Matrix Multipling time (seconds): 0.20s
Wine gdiplus.dll **without** optimizations with **inlining** `matrix_multiply` (Scaling is also using `matrix_multiply`): * 500 Matrix Scaling time (seconds): 0.21s * 700 Matrix Multipling time (seconds): 0.16s
Wine gdiplus.dll **with** optimizations and **inlining** (current Merge Request): * 500 Matrix Scaling time (seconds): 0.07s * 700 Matrix Multipling time (seconds): 0.11s