https://bugs.winehq.org/show_bug.cgi?id=55899
--- Comment #6 from Gabriel Ivăncescu gabrielopcode@gmail.com --- mpreferred-stack-boundary is how GCC prefers to keep the stack aligned. -mincoming-stack-boundary is what GCC assumes the incoming stack from external callers (i.e. code it cannot "see" or analyze itself). Both are powers-of-2, so value of 2 means 4 bytes, value of 4 means 16 bytes alignment.
-mpreferred-stack-boundary implies -mincoming-stack-boundary, if it's not set (it will be set to same value).
On 32-bit Windows, incoming stack is aligned to 4 bytes only (it's not guaranteed to be more), but on Linux it's 16 bytes. There's a bug with MinGW where it doesn't default to mpreferred-stack-boundary=2 (see MR !4030).
I don't know why it affects performance, though, but might be due to the bug. What I mean is it's totally normal for -mpreferred-stack-boundary=2 to be better, but I don't understand why it's "fast" when it always aligns the stack (reverting the commit).
Anyway I think this is another argument why we'd want to have that MR in… instead of waiting for upstream to fix it (if ever).