This fixes a rendering issue (and ultimately a crash) in PowerPoint when compiling with GCC 8.
GCC8 doesn't support the `excess-precision=standard` option under the `#pragma GCC optimize` directive.
This results in unpredictable floating point rounding leading to errors when inserting segments (with missing edges and/or triangles).
Using 24-bit precision ensures we don't have any excess precision.
-- v4: d2d1: Fix double free bug when d2d_geometry_sink_Close fails d2d1: Use 24-bit FP precision for triangulate.
From: Brendan McGrath bmcgrath@codeweavers.com
This fixes a rendering issue (and ultimately a crash) in PowerPoint when compiling with GCC 8.
GCC8 doesn't support the `excess-precision=standard` option under the `#pragma GCC optimize` directive.
This results in unpredictable floating point rounding leading to errors when inserting segments (with missing edges and/or triangles).
Using 24-bit precision ensures we don't have any excess precision. --- dlls/d2d1/geometry.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/dlls/d2d1/geometry.c b/dlls/d2d1/geometry.c index d6ee91de005..9b0b7844739 100644 --- a/dlls/d2d1/geometry.c +++ b/dlls/d2d1/geometry.c @@ -16,10 +16,6 @@ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA */
-#if defined(__GNUC__) && !defined(__clang__) -#pragma GCC optimize "O2,no-strict-aliasing,excess-precision=standard" -#endif - #include "d2d1_private.h" #include <float.h>
@@ -1539,7 +1535,7 @@ static BOOL d2d_cdt_triangulate(struct d2d_cdt *cdt, size_t start_vertex, size_t return TRUE; }
- /* More than tree vertices, divide. */ + /* More than three vertices, divide. */ cut = vertex_count / 2; if (!d2d_cdt_triangulate(cdt, start_vertex, cut, &left_outer, &left_inner)) return FALSE; @@ -2289,6 +2285,9 @@ static HRESULT d2d_path_geometry_triangulate(struct d2d_geometry *geometry) size_t vertex_count, i, j; struct d2d_cdt cdt = {0}; D2D1_POINT_2F *vertices; +#ifdef __i386__ + unsigned int control_word_x87, mask = 0; +#endif
for (i = 0, vertex_count = 0; i < geometry->u.path.figure_count; ++i) { @@ -2339,10 +2338,20 @@ static HRESULT d2d_path_geometry_triangulate(struct d2d_geometry *geometry)
cdt.free_edge = ~0u; cdt.vertices = vertices; + +#ifdef __i386__ + control_word_x87 = _controlfp(0, 0); + _controlfp(_PC_24, mask = _MCW_PC); +#endif if (!d2d_cdt_triangulate(&cdt, 0, vertex_count, &left_edge, &right_edge)) goto fail; if (!d2d_cdt_insert_segments(&cdt, geometry)) goto fail; +#ifdef __i386__ + _controlfp(control_word_x87, _MCW_PC); + mask = 0; +#endif + if (!d2d_cdt_generate_faces(&cdt, geometry)) goto fail;
@@ -2354,6 +2363,9 @@ fail: geometry->fill.vertex_count = 0; free(vertices); free(cdt.edges); +#ifdef __i386__ + if (mask) _controlfp(control_word_x87, mask); +#endif return E_FAIL; }
From: Brendan McGrath bmcgrath@codeweavers.com
geometry->fill.bezier_vertices was being freed on the failed path in d2d_geometry_sink_Close and then again when the path geometry was released (in d2d_geometry_cleanup).
By setting it to NULL after freeing it initially, all other calls to free it are a no-op. --- dlls/d2d1/geometry.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/dlls/d2d1/geometry.c b/dlls/d2d1/geometry.c index 9b0b7844739..3da3ad2e65b 100644 --- a/dlls/d2d1/geometry.c +++ b/dlls/d2d1/geometry.c @@ -3247,6 +3247,7 @@ done: if (FAILED(hr)) { free(geometry->fill.bezier_vertices); + geometry->fill.bezier_vertices = NULL; geometry->fill.bezier_vertex_count = 0; d2d_path_geometry_free_figures(geometry); geometry->u.path.state = D2D_GEOMETRY_STATE_ERROR;
On Thu Jan 11 06:45:47 2024 +0000, Brendan Shanks wrote:
Should the `_controlfp()` calls be ifdef'd to i386 only? The [MS docs](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/control87-...) say that "On the ARM, ARM64, and x64 platforms, changing the infinity mode or the floating-point precision isn't supported. If the precision control mask is used on the x64 platform, the function raises an assertion, and the invalid parameter handler is invoked, as described in Parameter validation."
You're right. I did have the #ifdefs in at one stage, but I thought (in theory at least) we could run in to the same issue with the other architectures. So I removed them. Even though I had a test app that (when compiled as PE32+) worked fine.
I didn't immediately understand why that was tbh. But I just took a look at the generated machine code for the 64-bit `d2d1.dll` and learnt that x64 does floating point different to x86 (it's not using the x87 instruction set).
I also just checked the debug when running the PE32+ test app and it seems to show the calls to `_controlfp` are a no-op anyway.
So I've put those #ifdefs back.
On Thu Jan 11 06:45:47 2024 +0000, Brendan McGrath wrote:
You're right. I did have the #ifdefs in at one stage, but I thought (in theory at least) we could run in to the same issue with the other architectures. So I removed them. Even though I had a test app that (when compiled as PE32+) worked fine. I didn't immediately understand why that was tbh. But I just took a look at the generated machine code for the 64-bit `d2d1.dll` and learnt that x64 does floating point different to x86 (it's not using the x87 instruction set). I also just checked the debug when running the PE32+ test app and it seems to show the calls to `_controlfp` are a no-op anyway. So I've put those #ifdefs back.
Does it mean we'll have different behaviour and fp results on i386 vs the rest? This is probably fine as workaround, but my ultimate solution would be to use similar logic from WPF, that was even worked on at some point by @cmcadams. It's obviously a massive change, should it ever happen.
On Thu Jan 11 14:19:14 2024 +0000, Nikolay Sivov wrote:
Does it mean we'll have different behaviour and fp results on i386 vs the rest? This is probably fine as workaround, but my ultimate solution would be to use similar logic from WPF, that was even worked on at some point by @cmcadams. It's obviously a massive change, should it ever happen.
My current understanding (much of it acquired over the last 24h) is that, when using default settings, i386 is the odd one out. It uses the x87 instruction set, which by default, uses 80-bit extended precision. The fraction part being 64-bit: ![image](/uploads/bcef3457de70e3909e97a0e4a45b9785/image.png)
The problem with this is that it converts results back to single precision (32-bit) when storing the registers back into memory. The time at which this occurs when `fexcess-precision=fast` is undetermined. This can result in two different values for the same unchanged variable at two different times in the code (one read from the register at 80-bits, and another read from memory rounded to 32-bit).
The `_controlfp` instructs the x87 to use 24-bit precision (for the fraction part) which is single precision (32-bit): ![image](/uploads/63be36e75bb92fafadbc1a47bbeab762/image.png)
Hence it doesn't matter if the value is read from memory or the register; it's the same value.
The `fexcess-precision=standard` doesn't change the x87 precision, but causes values to be immediately written to memory (hence rounded).
Using `fpmath=sse` causes the use of SSE instructions and `XMM` registers, which use 32-bit precision for floats. This is also the default for x64.
I just ran a simple test program (compiled all the different ways mentioned) and they all produced the same results; except for the default option (as expected).
The test is on a simple function with random values passed in. The function is: ``` float fmul(float left, float right) { return left * right; } ```
Here's the instructions that were produced:
**Default**: ``` flds 0x8(%esp) fmuls 0x4(%esp) ```
Uses x87 instructions and returns result in register (which must be the 32-bit returning convention).
Using `_fpcontrol` produces the same instructions (but single precision instead of extended is used).
**fexcess-precision=standard**: ``` flds 0xc(%esp) fmuls 0x8(%esp) fstps (%esp) flds (%esp) ```
Again the x87 instructions are used, but the value is stored to and then reloaded from memory to round the value before returning it.
mpmath=sse: ``` movss 0xc(%esp),%xmm0 mulss 0x8(%esp),%xmm0 movss %xmm0,(%esp) flds (%esp) ```
The x87 instruction set is no longer used; but the 32-bit calling convention must return floats in the `ST(0)` register (hence it's stored to and then reloaded from memory before returning).
x64: ``` mulss %xmm1,%xmm0 ```
Uses the same instruction as `mpmath=see` but the calling conventions of x64 must allow the use of the `XMM` registers for passing function parameters and returning values.
This merge request was approved by Matteo Bruni.
This merge request was approved by Nikolay Sivov.