On 7/6/22 10:50, Jinoh Kang (@iamahuman) wrote:
Jinoh Kang (@iamahuman) commented about dlls/ntdll/large_int.c:
return udivmod(a, b, NULL);
}
+LONGLONG __regs__allshl( LONGLONG a, unsigned char b ) +{
- const LARGE_INTEGER x = { .QuadPart = a };
- LARGE_INTEGER ret;
- if (b >= 64)
return 0;
It appears that GCC's optimizer is having a hard time dealing with mixing full (64-bit) and partial (32-bit) writes.
Compare:
- https://godbolt.org/z/KvdGfr4bY (original)
With:
- https://godbolt.org/z/vG5TGrhaY (64-bit return ellided)
I'd suggest folding the special case into the if statement below. We do already lose performance by thunking, but it's still a good idea not to slow down (or, rather, amplify the I-cache usage of) a builtin too much.
Meanwhile, clang does not appear to suffer from this problem.
It doesn't seem to be about mixing writes (I get the "bad" pattern even if I write the high and low parts independently), but rather GCC doesn't seem to be able to CSE the zero write to LowPart.
Since it's a simple enough tweak I'll submit a new version that's friendlier to gcc codegen.