Jinoh Kang (@iamahuman) commented about dlls/ntdll/large_int.c:
return udivmod(a, b, NULL);
}
+LONGLONG __regs__allshl( LONGLONG a, unsigned char b ) +{
- const LARGE_INTEGER x = { .QuadPart = a };
- LARGE_INTEGER ret;
- if (b >= 64)
return 0;
It appears that GCC's optimizer is having a hard time dealing with mixing full (64-bit) and partial (32-bit) writes.
Compare:
- https://godbolt.org/z/KvdGfr4bY (original)
With:
- https://godbolt.org/z/vG5TGrhaY (64-bit return ellided)
I'd suggest folding the special case into the if statement below. We do already lose performance by thunking, but it's still a good idea not to slow down (or, rather, amplify the I-cache usage of) a builtin too much.
Meanwhile, clang does not appear to suffer from this problem.