..and improve performance and precision of parse_numeric_literal() for doubles, and replace artificial (INT_MAX/100) border conditions with the true +/- 308
10^308 is the largest representable power of 10 in double
From: toxieainc toxie@ainc.de
10^308 is the largest representable power of 10 in double --- dlls/vbscript/lex.c | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-)
diff --git a/dlls/vbscript/lex.c b/dlls/vbscript/lex.c index 8c5c69ea429..414dc27f9cc 100644 --- a/dlls/vbscript/lex.c +++ b/dlls/vbscript/lex.c @@ -94,6 +94,31 @@ static const struct { {L"xor", tXOR} };
+static const double pow10[309] = { +1.e0,1.e1,1.e2,1.e3,1.e4,1.e5,1.e6,1.e7,1.e8,1.e9,1.e10,1.e11,1.e12,1.e13,1.e14,1.e15,1.e16, +1.e17,1.e18,1.e19,1.e20,1.e21,1.e22,1.e23,1.e24,1.e25,1.e26,1.e27,1.e28,1.e29,1.e30,1.e31, +1.e32,1.e33,1.e34,1.e35,1.e36,1.e37,1.e38,1.e39,1.e40,1.e41,1.e42,1.e43,1.e44,1.e45,1.e46, +1.e47,1.e48,1.e49,1.e50,1.e51,1.e52,1.e53,1.e54,1.e55,1.e56,1.e57,1.e58,1.e59,1.e60,1.e61, +1.e62,1.e63,1.e64,1.e65,1.e66,1.e67,1.e68,1.e69,1.e70,1.e71,1.e72,1.e73,1.e74,1.e75,1.e76, +1.e77,1.e78,1.e79,1.e80,1.e81,1.e82,1.e83,1.e84,1.e85,1.e86,1.e87,1.e88,1.e89,1.e90,1.e91, +1.e92,1.e93,1.e94,1.e95,1.e96,1.e97,1.e98,1.e99,1.e100,1.e101,1.e102,1.e103,1.e104,1.e105, +1.e106,1.e107,1.e108,1.e109,1.e110,1.e111,1.e112,1.e113,1.e114,1.e115,1.e116,1.e117,1.e118, +1.e119,1.e120,1.e121,1.e122,1.e123,1.e124,1.e125,1.e126,1.e127,1.e128,1.e129,1.e130,1.e131, +1.e132,1.e133,1.e134,1.e135,1.e136,1.e137,1.e138,1.e139,1.e140,1.e141,1.e142,1.e143,1.e144, +1.e145,1.e146,1.e147,1.e148,1.e149,1.e150,1.e151,1.e152,1.e153,1.e154,1.e155,1.e156,1.e157, +1.e158,1.e159,1.e160,1.e161,1.e162,1.e163,1.e164,1.e165,1.e166,1.e167,1.e168,1.e169,1.e170, +1.e171,1.e172,1.e173,1.e174,1.e175,1.e176,1.e177,1.e178,1.e179,1.e180,1.e181,1.e182,1.e183, +1.e184,1.e185,1.e186,1.e187,1.e188,1.e189,1.e190,1.e191,1.e192,1.e193,1.e194,1.e195,1.e196, +1.e197,1.e198,1.e199,1.e200,1.e201,1.e202,1.e203,1.e204,1.e205,1.e206,1.e207,1.e208,1.e209, +1.e210,1.e211,1.e212,1.e213,1.e214,1.e215,1.e216,1.e217,1.e218,1.e219,1.e220,1.e221,1.e222, +1.e223,1.e224,1.e225,1.e226,1.e227,1.e228,1.e229,1.e230,1.e231,1.e232,1.e233,1.e234,1.e235, +1.e236,1.e237,1.e238,1.e239,1.e240,1.e241,1.e242,1.e243,1.e244,1.e245,1.e246,1.e247,1.e248, +1.e249,1.e250,1.e251,1.e252,1.e253,1.e254,1.e255,1.e256,1.e257,1.e258,1.e259,1.e260,1.e261, +1.e262,1.e263,1.e264,1.e265,1.e266,1.e267,1.e268,1.e269,1.e270,1.e271,1.e272,1.e273,1.e274, +1.e275,1.e276,1.e277,1.e278,1.e279,1.e280,1.e281,1.e282,1.e283,1.e284,1.e285,1.e286,1.e287, +1.e288,1.e289,1.e290,1.e291,1.e292,1.e293,1.e294,1.e295,1.e296,1.e297,1.e298,1.e299,1.e300, +1.e301,1.e302,1.e303,1.e304,1.e305,1.e306,1.e307,1.e308}; + static inline BOOL is_identifier_char(WCHAR c) { return iswalnum(c) || c == '_'; @@ -306,7 +331,7 @@ static int parse_numeric_literal(parser_ctx_t *ctx, void **ret)
do { e = e*10 + *(ctx->ptr++) - '0'; - if(sign == -1 && -e+exp < -(INT_MAX/100)) { + if(sign == -1 && -e+exp < -308) { /* The literal will be rounded to 0 anyway. */ while(is_digit(*ctx->ptr)) ctx->ptr++; @@ -314,7 +339,8 @@ static int parse_numeric_literal(parser_ctx_t *ctx, void **ret) return tDouble; }
- if(sign*e + exp > INT_MAX/100) { + if(sign*e + exp > 308) { + /* This would result in infinity. */ FIXME("Invalid numeric literal\n"); return 0; } @@ -328,7 +354,7 @@ static int parse_numeric_literal(parser_ctx_t *ctx, void **ret) return tInt; }
- r = exp>=0 ? d*pow(10, exp) : d/pow(10, -exp); + r = exp>=0 ? d*pow10[exp] : d/pow10[-exp]; if(isinf(r)) { FIXME("Invalid numeric literal\n"); return 0;
The failed test seems to be also there in other PRs?
I think it would be great if we could remove that code and use something like `wcstod` instead. The main reason we reimplement that here is mostly historical; in the past we couldn't use msvcrt/ucrtbase functions, but that's no longer the case.
On Wed Mar 19 14:59:12 2025 +0000, Jacek Caban wrote:
I think it would be great if we could remove that code and use something like `wcstod` instead. The main reason we reimplement that here is mostly historical; in the past we couldn't use msvcrt/ucrtbase functions, but that's no longer the case.
The problem with wcstod is that it depends on current locale, and that's not always desired. Maybe we could use something from oleaut32, even if duplicated?
On Wed Mar 19 14:59:12 2025 +0000, Nikolay Sivov wrote:
The problem with wcstod is that it depends on current locale, and that's not always desired. Maybe we could use something from oleaut32, even if duplicated?
Using `_wcstod_l` would probably solve the locale problem.
On Wed Mar 19 15:04:44 2025 +0000, Jacek Caban wrote:
Using `_wcstod_l` would probably solve the locale problem.
That would also be a nice solution.
Or maybe this is also another can of worms: https://medium.com/@tomysshadow/strtod-what-does-it-take-to-convert-strings-... (especially: 'This means — on Linux — that there is no way to guarantee that strtod will treat periods as decimal points. Of course, in practice, it’s probably safe to assume everyone has the "en_US" locale installed.')
On Wed Mar 19 16:00:55 2025 +0000, Carsten Waechter wrote:
Or maybe this is also another can of worms: https://medium.com/@tomysshadow/strtod-what-does-it-take-to-convert-strings-... (especially: 'This means — on Linux — that there is no way to guarantee that strtod will treat periods as decimal points. Of course, in practice, it’s probably safe to assume everyone has the "en_US" locale installed.')
Note that the article you have linked is incorrect regarding C locale (and this is the locale you should be using in jscript while parsing numbers).
I read into it some more, the suggestions/remarks by Jacek and Piotr make sense, but i wonder if this is maybe a very large hammer to solve the conversion here? From my understanding, one should still do some basic parsing first to distinguish between 64bit-ints and doubles anyway, in order to do an exact conversion.
So some steps would already be done twice. Additionally, one first has to create the C-locale, then the follow-up call to _wcstod_l then needs to process that locale part again and then do the conversion, then freeing the locale afterwards.
Any guidance on how to follow up on this then?
On Wed Mar 26 15:00:24 2025 +0000, Carsten Waechter wrote:
Any guidance on how to follow up on this then?
Doing string->double conversion correctly is hard and it definitely makes sense to use library for that. Even with your proposed patch jscript code is inaccurate.
Note that C-locale can be created once - there's no need to recreate it while parsing every number.