Jinoh Kang (@iamahuman) commented about dlls/ntdll/large_int.c:
return udivmod(a, b, NULL); }
+ +LONGLONG __regs__allshl( LONGLONG a, unsigned char b ) +{ + const LARGE_INTEGER x = { .QuadPart = a }; + LARGE_INTEGER ret; + + if (b >= 64) + return 0;
It appears that GCC's optimizer is having a hard time dealing with mixing full (64-bit) and partial (32-bit) writes. Compare: - https://godbolt.org/z/KvdGfr4bY (original) With: - https://godbolt.org/z/vG5TGrhaY (64-bit return ellided) I'd suggest folding the special case into the if statement below. We do already lose performance by thunking, but it's still a good idea not to slow down (or, rather, amplify the I-cache usage of) a builtin too much. Meanwhile, clang does not appear to suffer from this problem. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/375#note_3464