On Tue Jul 9 11:53:41 2024 +0000, Alexandros Frantzis wrote:
From a few synthetic benchmarks I ran locally, this change has a significant performance impact (1.7x-3x compared to just doing the memcpy, with higher overheads for smaller regions). Switching to a manual 32-bit pixel copy while setting the alpha is a definite improvement, with the results I am seeing being in the range of 1.2x-2x compared to just the memcpy:
width = rc.right - rc.left; for (x = 0; x < height * width; ++x) ((UINT32 *)dst)[x] = ((UINT32 *)src)[x] | 0xff000000;
So perhaps it's worth switching to such a loop?
Sure, now I'm wondering whether we could/should move alpha setting to win32u as well.