On Thu Jan 11 14:19:14 2024 +0000, Nikolay Sivov wrote:
Does it mean we'll have different behaviour and fp results on i386 vs the rest? This is probably fine as workaround, but my ultimate solution would be to use similar logic from WPF, that was even worked on at some point by @cmcadams. It's obviously a massive change, should it ever happen.
My current understanding (much of it acquired over the last 24h) is that, when using default settings, i386 is the odd one out. It uses the x87 instruction set, which by default, uses 80-bit extended precision. The fraction part being 64-bit: ![image](/uploads/bcef3457de70e3909e97a0e4a45b9785/image.png)
The problem with this is that it converts results back to single precision (32-bit) when storing the registers back into memory. The time at which this occurs when `fexcess-precision=fast` is undetermined. This can result in two different values for the same unchanged variable at two different times in the code (one read from the register at 80-bits, and another read from memory rounded to 32-bit).
The `_controlfp` instructs the x87 to use 24-bit precision (for the fraction part) which is single precision (32-bit): ![image](/uploads/63be36e75bb92fafadbc1a47bbeab762/image.png)
Hence it doesn't matter if the value is read from memory or the register; it's the same value.
The `fexcess-precision=standard` doesn't change the x87 precision, but causes values to be immediately written to memory (hence rounded).
Using `fpmath=sse` causes the use of SSE instructions and `XMM` registers, which use 32-bit precision for floats. This is also the default for x64.
I just ran a simple test program (compiled all the different ways mentioned) and they all produced the same results; except for the default option (as expected).
The test is on a simple function with random values passed in. The function is: ``` float fmul(float left, float right) { return left * right; } ```
Here's the instructions that were produced:
**Default**: ``` flds 0x8(%esp) fmuls 0x4(%esp) ```
Uses x87 instructions and returns result in register (which must be the 32-bit returning convention).
Using `_fpcontrol` produces the same instructions (but single precision instead of extended is used).
**fexcess-precision=standard**: ``` flds 0xc(%esp) fmuls 0x8(%esp) fstps (%esp) flds (%esp) ```
Again the x87 instructions are used, but the value is stored to and then reloaded from memory to round the value before returning it.
mpmath=sse: ``` movss 0xc(%esp),%xmm0 mulss 0x8(%esp),%xmm0 movss %xmm0,(%esp) flds (%esp) ```
The x87 instruction set is no longer used; but the 32-bit calling convention must return floats in the `ST(0)` register (hence it's stored to and then reloaded from memory before returning).
x64: ``` mulss %xmm1,%xmm0 ```
Uses the same instruction as `mpmath=see` but the calling conventions of x64 must allow the use of the `XMM` registers for passing function parameters and returning values.