On Fri Apr 4 00:12:14 2025 +0000, William Horvath wrote:
might end up not being inlined
FWIW `static FORCEINLINE void ...` should result in it always being inlined, at least with GCC and Clang. `FORCEINLINE` being the Wine macro for `__attribute__((always_inline)) inline`.
I wrote a test that converts a source of 4096x4096 pixels to a destination surface of 4096x4096 pixels. I ran the test 40 times for each format then took the average, these are the results between the function without `FORCEINLINE` and with `FORCEINLINE`:
| Source | Dest | non-inline | inline | diff | | ------ | ------ | ------ | ------ | ------ | | D3DFMT_P8 | D3DFMT_A8R8G8B8 | 1.1341057 | 1.1172716 | 0.0168341 | | D3DFMT_A8P8 | D3DFMT_A8R8G8B8 | 1.2521212 | 1.2365495 | 0.0155717 | | D3DFMT_Q8W8V8U8 | D3DFMT_A8R8G8B8 | 1.1889339 | 1.1724054 | 0.0165285 | | D3DFMT_A32B32G32R32F | D3DFMT_A8R8G8B8 | 0.7444296 | 0.7330986 | 0.011331 | | D3DFMT_P8 | D3DFMT_A32B32G32R32F | 1.0119892 | 0.9935202 | 0.018469 | | D3DFMT_A8P8 | D3DFMT_A32B32G32R32F | 1.1156599 | 1.0982643 | 0.0173956 | | D3DFMT_Q8W8V8U8 | D3DFMT_A32B32G32R32F | 1.0075875 | 0.9763689 | 0.0312186 | | D3DFMT_A8R8G8B8 | D3DFMT_A32B32G32R32F | 1.4340131 | 1.4098957 | 0.0241174 | | D3DFMT_P8 | D3DFMT_Q8W8V8U8 | 1.2236793 | 1.1992818 | 0.0243975 | | D3DFMT_A8P8 | D3DFMT_Q8W8V8U8 | 1.3356345 | 1.3049285 | 0.030706 | | D3DFMT_A32B32G32R32F | D3DFMT_Q8W8V8U8 | 0.7572374 | 0.7376824 | 0.019555 | | D3DFMT_A8R8G8B8 | D3DFMT_Q8W8V8U8 | 1.6424342 | 1.6185154 | 0.0239188 |
So, there is a consistent performance difference between the two, with `FORCEINLINE` being faster.
Compared to native we're pretty slow either way, here are the results from native: | Source | Dest | Time | | ------ | ------ | ------ | | D3DFMT_P8 | D3DFMT_A8R8G8B8 | 0.2615006 | | D3DFMT_A8P8 | D3DFMT_A8R8G8B8 | 0.2651899 | | D3DFMT_Q8W8V8U8 | D3DFMT_A8R8G8B8 | 0.2894723 | | D3DFMT_A32B32G32R32F | D3DFMT_A8R8G8B8 | 0.2726750 | | D3DFMT_P8 | D3DFMT_A32B32G32R32F | 0.1033063 | | D3DFMT_A8P8 | D3DFMT_A32B32G32R32F | 0.1009229 | | D3DFMT_Q8W8V8U8 | D3DFMT_A32B32G32R32F | 0.0864582 | | D3DFMT_A8R8G8B8 | D3DFMT_A32B32G32R32F | 0.0719979 | | D3DFMT_P8 | D3DFMT_Q8W8V8U8 | 0.3161181 | | D3DFMT_A8P8 | D3DFMT_Q8W8V8U8 | 0.3253104 | | D3DFMT_A32B32G32R32F | D3DFMT_Q8W8V8U8 | 0.2874802 | | D3DFMT_A8R8G8B8 | D3DFMT_Q8W8V8U8 | 0.2879247 |
I think it makes sense to use `FORCEINLINE` here.