On 9/14/21 1:01 PM, Rémi Bernon wrote:
And what about using intel intrinsics? Like for instance:
#ifdef __SSE2__ #ifdef __i386__ if (sse2_supported) #endif { __m128i x = _mm_set1_epi64x(v); while (n >= 64) { _mm_store_si128((__m128i *)(d + n - 64), x); _mm_store_si128((__m128i *)(d + n - 48), x); _mm_store_si128((__m128i *)(d + n - 32), x); _mm_store_si128((__m128i *)(d + n - 16), x); n -= 64; } if (n >= 32) { _mm_store_si128((__m128i *)(d + n - 32), x); _mm_store_si128((__m128i *)(d + n - 16), x); } return; } #endif
In all cases, if SSE is disabled at compile-time it will not be able to use SSE2 path at runtime, even if supported. Which was possible with the assembly function.
Is this something we would like to have?
I don't think this is portable. I quick test shows that it doesn't compile with x86_64-w64-mingw on my machine.
Thanks, Piotr