http://bugs.winehq.org/show_bug.cgi?id=28422
Summary: scanf family of functions provides only 7 digits of precision for converting doubles and long doubles Product: Wine Version: 1.3.26 Platform: x86 OS/Version: Linux Status: UNCONFIRMED Severity: critical Priority: P2 Component: msvcrt AssignedTo: wine-bugs@winehq.org ReportedBy: irwin@beluga.phys.uvic.ca
Created an attachment (id=36434) --> (http://bugs.winehq.org/attachment.cgi?id=36434) Patch to greatly reduce numerical noise in scanf conversion of doubles and long doubles
If I compile the test_scanf.c code attached below using "gcc test_scanf.c" under MinGW/MSYS on wine-1.3.26, I get the following results from running the a.exe executable that is created by that build under Wine.
bash.exe-3.1$ echo "1.1e-30" |./a.exe 1.1e-30 is input string 1.1000004917384256455392e-030 = 3f9b64f8772f16505258 is long double value 1.1000004917384256198806e-030 = 39b64f8772f16505 is double value 1.1000004917384e-030: 13: 1.1000009834770454030908e-030 = 39b64f881a45deaf 1.10000049173843e-030: 14: 1.1000009834770755310078e-030 = 39b64f881a45df5b 1.100000491738426e-030: 15: 1.1000009834770715022747e-030 = 39b64f881a45df44 1.1000004917384256e-030: 16: 1.1000009834770711519501e-030 = 39b64f881a45df42 1.10000049173842562e-030: 17: 1.1000009834770711519501e-030 = 39b64f881a45df42
I annotate the above lines of output to demonstrate what the test application does.
1. Read a string from stdin (in this case 1.1e-30) and output the result.
2. Transform that string to a long-double value (80-bit floating point) using sscanf with a format string of "%Le" and output that result in decimal and hexadecimal. The decimal result (in this case 1.1000004917384256455392e-030) immediately demonstrates the bad numerical significance loss in sscanf since the answer has a relative error of ~5e-7 rather than the expected relative error of order ~1.e-20. The 64-bit mantissa in the hexadecimal representation of the long double is shifted to the left _on output_ by one bit to simulate the hidden bit that occurs for double values to make hex comparisons with the corresponding double in line 3 easier.
3. Transform the string to a double value (64-bit floating point) using sscanf with a format string of "%le" and output the result in both decimal and (unshifted) hexadecimal form. The decimal form (of 1.1000004917384256198806e-030) demonstrates a relative error of 5.e-7 rather than the expected relative error of order of 1.e-16.
4-8. Use sprintf to write the double-precision form of the number to a character string in rounded form using a precision of "i" where i ranges from 13 to 17. Transform that string to a double value (64-bit floating point) using sscanf with a format string of "%le" and output the rounded string, the value of i, and the double result in both decimal and (unshifted) hexadecimal form. This test indicates how much precision is required for the conversion from double to a rounded string in order to read back a double that is the same as the original double. In this case at a precision of 16 and beyond sscanf provides consistent results, but they are not the same as the original double because of the large numerical noise in the results from sscanf.
I then applied a patch (attached) to wine-1.3.26/dlls/msvcrt/scanf.h that fixes this numerical precision issue for conversions of doubles and long doubles by the scanf family of functions. (Note, wine-1.3.26/dlls/msvcrt/scanf.h and wine-1.3.28/dlls/msvcrt/scanf.h are identical so this patch should apply for wine-1.3.28 as well.) What the patch does is make sure all calculations during the conversion are done in long double precision with results scaled in such a way that all calculations except a possible multiplication or division at the last are done with integers stored in long double form. These changes assure exact results for input numbers between 0 and 2^65 - 1 that can be represented exactly in long double form, and results with minimal numerical noise in other cases. Here are the same test results for this patched test case:
bash.exe-3.1$ echo "1.1e-30" |./a.exe 1.1e-30 is input string 1.0999999999999999999835e-030 = 3f9b64f86cb9cefaf7a0 is long double value 1.0999999999999999165078e-030 = 39b64f86cb9cefaf is double value 1.1000000000000e-030: 13: 1.0999999999999999165078e-030 = 39b64f86cb9cefaf 1.10000000000000e-030: 14: 1.0999999999999999165078e-030 = 39b64f86cb9cefaf 1.100000000000000e-030: 15: 1.0999999999999999165078e-030 = 39b64f86cb9cefaf 1.0999999999999999e-030: 16: 1.0999999999999999165078e-030 = 39b64f86cb9cefaf 1.09999999999999992e-030: 17: 1.0999999999999999165078e-030 = 39b64f86cb9cefaf
The results are vastly improved by the attached patch for the scanf family of functions. Now the relative numerical error for "%Le" conversion to long double is reduced from 5.e-7 to 2.e-20 and the relative numerical error for "%le" conversion to double is reduced from 5.e-7 to ~1.e-16. Furthermore, the round trip test of converting a double to rounded string form and then back to double shows exact agreement. (For other tests with repeating decimal input such as 1.111111111111111111111111111e-30, exact agreement was obtained for a rounded precision of 16.)
Just as a matter of interest, here are the results for the same test application (compiled with gcc for Linux) on Linux:
software@raven> echo "1.1e-30" |./a.out 1.1e-30 is input string 1.0999999999999999999835e-30 = 3f9b64f86cb9cefaf7a0 is long double value 1.0999999999999999165078e-30 = 39b64f86cb9cefaf is double value 1.1000000000000e-30: 13: 1.0999999999999999165078e-30 = 39b64f86cb9cefaf 1.10000000000000e-30: 14: 1.0999999999999999165078e-30 = 39b64f86cb9cefaf 1.100000000000000e-30: 15: 1.0999999999999999165078e-30 = 39b64f86cb9cefaf 1.0999999999999999e-30: 16: 1.0999999999999999165078e-30 = 39b64f86cb9cefaf 1.09999999999999992e-30: 17: 1.0999999999999999165078e-30 = 39b64f86cb9cefaf which is identical with the patched wine result except for the 2-digit exponents.
When I created JPL binary ephemerides (consisting of roughly 1.5GB of doubles) from JPL ascii ephemerides with my ephcom-2.0.2 software on Linux and on Wine-1.3.26 built with the attached patch, I got exact agreement between the double data for the Linux- and Wine-produced binary ephemerides in most cases. However, 0.04 per cent of the time the double values on the two platforms differed at the 1.e-16 relative difference level which is consistent with scanf errors in the two cases differing on average by one in the last bit of the long double representation. By chance such small differences would propagate to the double representation (with 11 bits less in the mantissa when counting the hidden bit) roughly 0.04 per cent of the time. So I feel this patched scanf family of functions does very well against the Linux equivalent.
I have classified this apparently long-standing bug after a lot of thought as "critical". The reason for that classification is the scanf family is a fundamental building block for any platform and the numerical precision of the scanf family for double values is critical to a lot of applications (such as ephcom-2.0.2 where I first discovered the issue). One could argue that a workaround is available (Dan Kegel noted this on the wine-devel list) of using Microsoft's version of msvcrt.dll rather than the Wine version. So this is not critical because of the availability of that alternative msvcrt.dll. However, there is a chicken/egg problem here. Presumably some of those apps that use the Microsoft version of msvcrt.dll do so because the scanf numerical precision bug in Wine's version is giving them trouble. Anyhow, applying this patch going forward is fundamental to Wine as an independent platform so that is why I chose to use a "critical" classification as a first approximation subject, of course, to any reclassification Wine developers want to make for this bug.