[PATCH 0/1] MR7591: Improve performance and precision of parse_numeric_literal() for doubles

List overview All Threads

newer

older

Re: [PATCH 0/4] MR7667:...

[PATCH 0/5] MR7665: opengl32:...

Carsten Waechter (＠toxieainc)

16 Mar 2025 16 Mar '25

4:37 p.m.

..and improve performance and precision of parse_numeric_literal() for doubles, and replace artificial (INT_MAX/100) border conditions with the true +/- 308

10^308 is the largest representable power of 10 in double

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7591

Show replies by date

toxieainc

16 Mar 16 Mar

4:38 p.m.

New subject: [PATCH 1/1] Improve performance and precision of parse_numeric_literal() for doubles, and replace artificial (INT_MAX/100) border conditions with the true +/- 308

From: toxieainc toxie@ainc.de

10^308 is the largest representable power of 10 in double --- dlls/vbscript/lex.c | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/dlls/vbscript/lex.c b/dlls/vbscript/lex.c index 8c5c69ea429..414dc27f9cc 100644 --- a/dlls/vbscript/lex.c +++ b/dlls/vbscript/lex.c @@ -94,6 +94,31 @@ static const struct { {L"xor", tXOR} };

+static const double pow10[309] = { +1.e0,1.e1,1.e2,1.e3,1.e4,1.e5,1.e6,1.e7,1.e8,1.e9,1.e10,1.e11,1.e12,1.e13,1.e14,1.e15,1.e16, +1.e17,1.e18,1.e19,1.e20,1.e21,1.e22,1.e23,1.e24,1.e25,1.e26,1.e27,1.e28,1.e29,1.e30,1.e31, +1.e32,1.e33,1.e34,1.e35,1.e36,1.e37,1.e38,1.e39,1.e40,1.e41,1.e42,1.e43,1.e44,1.e45,1.e46, +1.e47,1.e48,1.e49,1.e50,1.e51,1.e52,1.e53,1.e54,1.e55,1.e56,1.e57,1.e58,1.e59,1.e60,1.e61, +1.e62,1.e63,1.e64,1.e65,1.e66,1.e67,1.e68,1.e69,1.e70,1.e71,1.e72,1.e73,1.e74,1.e75,1.e76, +1.e77,1.e78,1.e79,1.e80,1.e81,1.e82,1.e83,1.e84,1.e85,1.e86,1.e87,1.e88,1.e89,1.e90,1.e91, +1.e92,1.e93,1.e94,1.e95,1.e96,1.e97,1.e98,1.e99,1.e100,1.e101,1.e102,1.e103,1.e104,1.e105, +1.e106,1.e107,1.e108,1.e109,1.e110,1.e111,1.e112,1.e113,1.e114,1.e115,1.e116,1.e117,1.e118, +1.e119,1.e120,1.e121,1.e122,1.e123,1.e124,1.e125,1.e126,1.e127,1.e128,1.e129,1.e130,1.e131, +1.e132,1.e133,1.e134,1.e135,1.e136,1.e137,1.e138,1.e139,1.e140,1.e141,1.e142,1.e143,1.e144, +1.e145,1.e146,1.e147,1.e148,1.e149,1.e150,1.e151,1.e152,1.e153,1.e154,1.e155,1.e156,1.e157, +1.e158,1.e159,1.e160,1.e161,1.e162,1.e163,1.e164,1.e165,1.e166,1.e167,1.e168,1.e169,1.e170, +1.e171,1.e172,1.e173,1.e174,1.e175,1.e176,1.e177,1.e178,1.e179,1.e180,1.e181,1.e182,1.e183, +1.e184,1.e185,1.e186,1.e187,1.e188,1.e189,1.e190,1.e191,1.e192,1.e193,1.e194,1.e195,1.e196, +1.e197,1.e198,1.e199,1.e200,1.e201,1.e202,1.e203,1.e204,1.e205,1.e206,1.e207,1.e208,1.e209, +1.e210,1.e211,1.e212,1.e213,1.e214,1.e215,1.e216,1.e217,1.e218,1.e219,1.e220,1.e221,1.e222, +1.e223,1.e224,1.e225,1.e226,1.e227,1.e228,1.e229,1.e230,1.e231,1.e232,1.e233,1.e234,1.e235, +1.e236,1.e237,1.e238,1.e239,1.e240,1.e241,1.e242,1.e243,1.e244,1.e245,1.e246,1.e247,1.e248, +1.e249,1.e250,1.e251,1.e252,1.e253,1.e254,1.e255,1.e256,1.e257,1.e258,1.e259,1.e260,1.e261, +1.e262,1.e263,1.e264,1.e265,1.e266,1.e267,1.e268,1.e269,1.e270,1.e271,1.e272,1.e273,1.e274, +1.e275,1.e276,1.e277,1.e278,1.e279,1.e280,1.e281,1.e282,1.e283,1.e284,1.e285,1.e286,1.e287, +1.e288,1.e289,1.e290,1.e291,1.e292,1.e293,1.e294,1.e295,1.e296,1.e297,1.e298,1.e299,1.e300, +1.e301,1.e302,1.e303,1.e304,1.e305,1.e306,1.e307,1.e308}; + static inline BOOL is_identifier_char(WCHAR c) { return iswalnum(c) || c == '_'; @@ -306,7 +331,7 @@ static int parse_numeric_literal(parser_ctx_t *ctx, void **ret)

do { e = e*10 + *(ctx->ptr++) - '0'; - if(sign == -1 && -e+exp < -(INT_MAX/100)) { + if(sign == -1 && -e+exp < -308) { /* The literal will be rounded to 0 anyway. */ while(is_digit(*ctx->ptr)) ctx->ptr++; @@ -314,7 +339,8 @@ static int parse_numeric_literal(parser_ctx_t *ctx, void **ret) return tDouble; }

- if(sign*e + exp > INT_MAX/100) { + if(sign*e + exp > 308) { + /* This would result in infinity. */ FIXME("Invalid numeric literal\n"); return 0; } @@ -328,7 +354,7 @@ static int parse_numeric_literal(parser_ctx_t *ctx, void **ret) return tInt; }

- r = exp>=0 ? d*pow(10, exp) : d/pow(10, -exp); + r = exp>=0 ? d*pow10[exp] : d/pow10[-exp]; if(isinf(r)) { FIXME("Invalid numeric literal\n"); return 0;

-- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/7591

Carsten Waechter (＠toxieainc)

9:26 p.m.

The failed test seems to be also there in other PRs?

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7591#note_98030

Jacek Caban (＠jacek)

19 Mar 19 Mar

2:50 p.m.

I think it would be great if we could remove that code and use something like `wcstod` instead. The main reason we reimplement that here is mostly historical; in the past we couldn't use msvcrt/ucrtbase functions, but that's no longer the case.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7591#note_98420

Nikolay Sivov (＠nsivov)

2:59 p.m.

On Wed Mar 19 14:59:12 2025 +0000, Jacek Caban wrote:

...

I think it would be great if we could remove that code and use something like `wcstod` instead. The main reason we reimplement that here is mostly historical; in the past we couldn't use msvcrt/ucrtbase functions, but that's no longer the case.

The problem with wcstod is that it depends on current locale, and that's not always desired. Maybe we could use something from oleaut32, even if duplicated?

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7591#note_98423

Jacek Caban (＠jacek)

3:04 p.m.

On Wed Mar 19 14:59:12 2025 +0000, Nikolay Sivov wrote:

...

The problem with wcstod is that it depends on current locale, and that's not always desired. Maybe we could use something from oleaut32, even if duplicated?

Using `_wcstod_l` would probably solve the locale problem.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7591#note_98431

Carsten Waechter (＠toxieainc)

3:11 p.m.

On Wed Mar 19 15:04:44 2025 +0000, Jacek Caban wrote:

...

Using `_wcstod_l` would probably solve the locale problem.

That would also be a nice solution.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7591#note_98432

Carsten Waechter (＠toxieainc)

3:26 p.m.

Or maybe this is also another can of worms: https://medium.com/@tomysshadow/strtod-what-does-it-take-to-convert-strings-... (especially: 'This means — on Linux — that there is no way to guarantee that strtod will treat periods as decimal points. Of course, in practice, it’s probably safe to assume everyone has the "en_US" locale installed.')

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7591#note_98436

Piotr Caban (＠piotr)

4 p.m.

On Wed Mar 19 16:00:55 2025 +0000, Carsten Waechter wrote:

...

Or maybe this is also another can of worms: https://medium.com/@tomysshadow/strtod-what-does-it-take-to-convert-strings-... (especially: 'This means — on Linux — that there is no way to guarantee that strtod will treat periods as decimal points. Of course, in practice, it’s probably safe to assume everyone has the "en_US" locale installed.')

Note that the article you have linked is incorrect regarding C locale (and this is the locale you should be using in jscript while parsing numbers).

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7591#note_98437

Carsten Waechter (＠toxieainc)

20 Mar 20 Mar

8:01 a.m.

I read into it some more, the suggestions/remarks by Jacek and Piotr make sense, but i wonder if this is maybe a very large hammer to solve the conversion here? From my understanding, one should still do some basic parsing first to distinguish between 64bit-ints and doubles anyway, in order to do an exact conversion.

So some steps would already be done twice. Additionally, one first has to create the C-locale, then the follow-up call to _wcstod_l then needs to process that locale part again and then do the conversion, then freeing the locale afterwards.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7591#note_98508

Carsten Waechter (＠toxieainc)

26 Mar 26 Mar

10:46 a.m.

Any guidance on how to follow up on this then?

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7591#note_99124

Piotr Caban (＠piotr)

3 p.m.

On Wed Mar 26 15:00:24 2025 +0000, Carsten Waechter wrote:

...

Any guidance on how to follow up on this then?

Doing string->double conversion correctly is hard and it definitely makes sense to use library for that. Even with your proposed patch jscript code is inaccurate.

Note that C-locale can be created once - there's no need to recreate it while parsing every number.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/7591#note_99128

220

Age (days ago)

230

Last active (days ago)

wine-gitlab@winehq.org

11 comments

5 participants

tags (0)

participants (5)

Carsten Waechter (＠toxieainc)
Jacek Caban (＠jacek)
Nikolay Sivov (＠nsivov)
Piotr Caban (＠piotr)
toxieainc