Jinoh Kang (@iamahuman) commented about dlls/ntdll/rtl.c:
* the 32-bit addition above to handle.
*
* This produces an integer in the range [0x7fffffe2, 0xfffffffe].
*/
- result = 0x7fffffffu * 2 - result;
- /* Perform the 2nd round of the modulus calculation.
* This produces an integer in the range [0, 0x7fffffff].
*/
- result = (result & 0x7fffffffu) + (result >> 31);
- /* If result is 0x7fffffff, set it to 0. We avoid branching here so that
* RtlUniform runs in near-constant time. This also avoids unexpected
* performance hit due to polluted branch target buffer.
*/
- result &= -((0x7fffffffu * 2 - result) >> 31);
If we assume that the right shift operator is always defined as arithmetic shift, we can optimize it further:
```suggestion:-0+0 result &= (LONG)(0x7fffffffu * 2 - result) >> 31; ```
That said, GCC (https://godbolt.org/z/65od5ErMo) does this optimization already. Meanwhile clang (https://godbolt.org/z/46x8a8r9v) seems to recognize the trick and replace it with `inc + cmovns` (since `0x7fffffff + 1` is `0x80000000` which sets SF).