On 7/14/20 19:01, Gabriel Ivăncescu wrote:
You mean it's not forcefully aligned, right? If so, I think that's normal since the MS ABI mandates that it is 16 byte aligned.
I am not sure this is normal if you explicitly use __attribute__((force_align_arg_pointer)) on function, without even a warning with -Wall. Looks like probably gcc or mingw bug to me.
For this patch, just for future reference, you should be using movdqu, which anyway it's just as fast as movdqa unless the processor is very old. I don't know if it's still necessary though.
movdqa / movaps can (or could) work on unaligned addresses on some CPUs, but seems to fault on most. FWIW I generally prefer mov{a|u}ps for SSE regs, even if just to not mess them up with movq like I did.