Re: [PATCH] jscript: Ignore BOM mark in next_token. (try 5)

List overview All Threads

newer

older

Re: [PATCH 5/5] ws2_32: Return...

Re: [PATCH] user32/tests: Test...

Nikolay Sivov

2 Oct 2014 2 Oct '14

6:29 a.m.

...

static BOOL skip_spaces(parser_ctx_t *ctx) {

while(ctx->ptr < ctx->end && isspaceW(*ctx->ptr)) {

while(ctx->ptr < ctx->end && (isspaceW(*ctx->ptr) || *ctx->ptr == 0xFEFF /* UTF16 BOM */)) { if(is_endline(*ctx->ptr++)) ctx->nl = TRUE; }

This looks correct according to ECMA-252 section 7.2 - all of the following is a whitespace:

- tab and vertical tab, 0x9 and 0xb; - form feed 0xc - space 0x20 - NBSP 0xa0 - UTF-16 BOM 0xfeff - any other Unicode "space separator"

Hopefully isspaceW() covers everything but the BOM. What worries me is that isspaceW() itself is used in numerous places in code on its own. So probably we need more tests to cover more cases where space separators could be used, and later have our own is_space() call that will conform to the standard.

Show replies by date

Jacek Caban

2 Oct 2 Oct

10:48 a.m.

New subject: [PATCH] jscript: Ignore BOM mark in next_token. (try 5)

On 10/02/14 08:29, Nikolay Sivov wrote:

...

...
static BOOL skip_spaces(parser_ctx_t *ctx) {

while(ctx->ptr < ctx->end && isspaceW(*ctx->ptr)) {

while(ctx->ptr < ctx->end && (isspaceW(*ctx->ptr) || *ctx->ptr

== 0xFEFF /* UTF16 BOM */)) { if(is_endline(*ctx->ptr++)) ctx->nl = TRUE; }

This looks correct according to ECMA-252 section 7.2 - all of the following is a whitespace:

tab and vertical tab, 0x9 and 0xb;

form feed 0xc

space 0x20

NBSP 0xa0

UTF-16 BOM 0xfeff

any other Unicode "space separator"

Hopefully isspaceW() covers everything but the BOM. What worries me is that isspaceW() itself is used in numerous places in code on its own. So probably we need more tests to cover more cases where space separators could be used, and later have our own is_space() call that will conform to the standard.

FWIW, ECMA-262 (which I usually use for jscript development) doesn't mention UTF-16 as white space. Anyway, I agree that it would be interesting to see if it's considered white space in other places as well. (I'm also fine with the patch in current form, but an extended version would be obviously better).

Jacek

Nikolay Sivov

11:06 a.m.

New subject: [PATCH] jscript: Ignore BOM mark in next_token. (try 5)

On 10/2/2014 14:48, Jacek Caban wrote:

...

On 10/02/14 08:29, Nikolay Sivov wrote:

...
...
static BOOL skip_spaces(parser_ctx_t *ctx) {

while(ctx->ptr < ctx->end && isspaceW(*ctx->ptr)) {

while(ctx->ptr < ctx->end && (isspaceW(*ctx->ptr) || *ctx->ptr

== 0xFEFF /* UTF16 BOM */)) { if(is_endline(*ctx->ptr++)) ctx->nl = TRUE; }

This looks correct according to ECMA-252 section 7.2 - all of the following is a whitespace:

tab and vertical tab, 0x9 and 0xb;

form feed 0xc

space 0x20

NBSP 0xa0

UTF-16 BOM 0xfeff

any other Unicode "space separator"

Hopefully isspaceW() covers everything but the BOM. What worries me is that isspaceW() itself is used in numerous places in code on its own. So probably we need more tests to cover more cases where space separators could be used, and later have our own is_space() call that will conform to the standard.

FWIW, ECMA-262 (which I usually use for jscript development) doesn't mention UTF-16 as white space.

Sorry, 252 was a typo of course. It does mention it here: http://www.ecma-international.org/ecma-262/5.1/#sec-7.2

...

Anyway, I agree that it would be interesting to see if it's considered white space in other places as well. (I'm also fine with the patch in current form, but an extended version would be obviously better).

Sure, I'm not saying it's wrong either, just pointing out a potential direction for improvement.

...

Jacek

Qian Hong

11:26 a.m.

New subject: [PATCH] jscript: Ignore BOM mark in next_token. (try 5)

Hi Nikolay,

Thanks for commenting, your thought is helpful as usual :)

I added tests for more space separators, also added tests for parseFloat(): https://testbot.winehq.org/JobDetails.pl?Key=9203&log_206=1#k206

The test result show that, older version of jscript doesn't flow the standard, while newer version is much better. The test result also show that, BOM and other space separators are also ignored in functions like parseFloat. I'm planing on adding tests for the following group of functions.

parseInt() eval() isNaN() isFinite() new Number() new Function()

new Date() Date.parse()

RegExp()

My idea is fixing the code in lex.c first, then fixing other place using isspaceW in an incremental way with tests provided, like the attached patches demos.

Currently there are two troubles bothering me: 1. We need a good way to skip space tests on older IEs 2. We need a good way to reduce the length of WCHAR array for the jscript code, I believe most of us hate a long array like below...

{'v','a','r',' ','a',' ','=',' ','p','a','r','s','e','F','l','o','a','t','(','"',SPACE_HOLDER,'3','.','1','4',SPACE_HOLDER,'"',')',';','o','k','(','a','=','=','3','.','1','4',',',' ','"','h','a','h','a','"',')',';','r','e','p','o','r','t','S','u','c','c','e','s','s','(',')',';','\0'}

Do you have any advice?

Thanks!

Qian Hong

9:39 p.m.

New subject: [PATCH] jscript: Ignore BOM mark in next_token. (try 5)

The patch has got in, but if you like, I can proposal a new version like the attached patches. (Need rebasing on latest Wine).

On Thu, Oct 2, 2014 at 7:26 PM, Qian Hong fracting@gmail.com wrote:

...

Hi Nikolay,

Thanks for commenting, your thought is helpful as usual :)

I added tests for more space separators, also added tests for parseFloat(): https://testbot.winehq.org/JobDetails.pl?Key=9203&log_206=1#k206

The test result show that, older version of jscript doesn't flow the standard, while newer version is much better. The test result also show that, BOM and other space separators are also ignored in functions like parseFloat. I'm planing on adding tests for the following group of functions.

parseInt() eval() isNaN() isFinite() new Number() new Function()

new Date() Date.parse()

RegExp()

My idea is fixing the code in lex.c first, then fixing other place using isspaceW in an incremental way with tests provided, like the attached patches demos.

Currently there are two troubles bothering me:

We need a good way to skip space tests on older IEs

We need a good way to reduce the length of WCHAR array for the

jscript code, I believe most of us hate a long array like below...

{'v','a','r',' ','a',' ','=',' ','p','a','r','s','e','F','l','o','a','t','(','"',SPACE_HOLDER,'3','.','1','4',SPACE_HOLDER,'"',')',';','o','k','(','a','=','=','3','.','1','4',',',' ','"','h','a','h','a','"',')',';','r','e','p','o','r','t','S','u','c','c','e','s','s','(',')',';','\0'}

Do you have any advice?

Thanks!

-- Regards, Qian Hong - http://www.winehq.org

Jacek Caban

3 Oct 3 Oct

2:50 p.m.

New subject: [PATCH] jscript: Ignore BOM mark in next_token. (try 5)

Hi Qian,

On 10/02/14 23:39, Qian Hong wrote:

...

The patch has got in, but if you like, I can proposal a new version like the attached patches. (Need rebasing on latest Wine).

If we're going to have more such tests, then I think a more generic way to write them would be nice. How about adding a new source file (say spaces.js) that would be preprocessed looking for a special sequence (like #\XXXX#) that would be replaced by WCHAR value?

Jacek

3924

Age (days ago)

3925

Last active (days ago)

wine-devel@winehq.org

5 comments

3 participants

tags (0)

participants (3)

Jacek Caban
Nikolay Sivov
Qian Hong