The bug is in gcc/mingw. The Windows ABI doesn't require the stack to be aligned, so when building a PE binary, gcc/mingw should already know that the stack isn't aligned, it shouldn't need to be told explicitly.
My understanding is that Mingw mostly gets it right, but there are some exotic options like `-march=znver4` that trigger a broken code path. The hope is that the compiler can be fixed, and in the meantime we can tell people to avoid using exotic options. If we simply hide the bug, more of these broken code paths will creep up.
The problem is summarized in [1], especially comment 5.
What gcc actually does is, it assumes 16-byte stack alignment for i686-w64-mingw32, but *if* SSE is enabled (as with -msse, or anything that implies it), it forces stack realignment on entry (as with -mstackrealign = __force_align_arg_pointer__).
This addresses most cases, but for some reason -mstackrealign is broken with -mavx512f (implied by znver4). But it doesn't address the case where type alignment is manually specified, described in [1].
The right fix in that case, proposed by a gcc developer, is to tell gcc that the stack is not 16 byte aligned. This is what -mpreferred-stack-boundary=2 does. The proposed patch to gcc would simply make this the default.
The broken interaction between -mstackrealign and -mavx512f is a bug that this would hide, and should absolutely be fixed independently. However, at the same time, the rest of the bug is something for which the right fix is to replace the -mstackrealign (which simply papers over the bug) with -mpreferred-stack-boundary=2.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111107
Old-style .dll.so are built with the host compiler, and thus follow the host ABI which assumes that the stack is always 16-byte aligned. That's why we have to use `force_align_arg_pointer`.
It wouldn't make sense to force aligning the stack everywhere and then tell the compiler that the stack isn't aligned.
Oh I see, I forgot that this problem is already fixed in that way.
I think that -mpreferred-stack-boundary=2 is a better solution than force_align_arg_pointer, since the former should only force stack realignment where we actually need it, but I'll leave that for a separate change.