static BOOL skip_spaces(parser_ctx_t *ctx) {
- while(ctx->ptr < ctx->end && isspaceW(*ctx->ptr)) {
- while(ctx->ptr < ctx->end && (isspaceW(*ctx->ptr) || *ctx->ptr == 0xFEFF /* UTF16 BOM */)) { if(is_endline(*ctx->ptr++)) ctx->nl = TRUE; }
This looks correct according to ECMA-252 section 7.2 - all of the following is a whitespace:
- tab and vertical tab, 0x9 and 0xb; - form feed 0xc - space 0x20 - NBSP 0xa0 - UTF-16 BOM 0xfeff - any other Unicode "space separator"
Hopefully isspaceW() covers everything but the BOM. What worries me is that isspaceW() itself is used in numerous places in code on its own. So probably we need more tests to cover more cases where space separators could be used, and later have our own is_space() call that will conform to the standard.
On 10/02/14 08:29, Nikolay Sivov wrote:
static BOOL skip_spaces(parser_ctx_t *ctx) {
- while(ctx->ptr < ctx->end && isspaceW(*ctx->ptr)) {
- while(ctx->ptr < ctx->end && (isspaceW(*ctx->ptr) || *ctx->ptr
== 0xFEFF /* UTF16 BOM */)) { if(is_endline(*ctx->ptr++)) ctx->nl = TRUE; }
This looks correct according to ECMA-252 section 7.2 - all of the following is a whitespace:
- tab and vertical tab, 0x9 and 0xb;
- form feed 0xc
- space 0x20
- NBSP 0xa0
- UTF-16 BOM 0xfeff
- any other Unicode "space separator"
Hopefully isspaceW() covers everything but the BOM. What worries me is that isspaceW() itself is used in numerous places in code on its own. So probably we need more tests to cover more cases where space separators could be used, and later have our own is_space() call that will conform to the standard.
FWIW, ECMA-262 (which I usually use for jscript development) doesn't mention UTF-16 as white space. Anyway, I agree that it would be interesting to see if it's considered white space in other places as well. (I'm also fine with the patch in current form, but an extended version would be obviously better).
Jacek
On 10/2/2014 14:48, Jacek Caban wrote:
On 10/02/14 08:29, Nikolay Sivov wrote:
static BOOL skip_spaces(parser_ctx_t *ctx) {
- while(ctx->ptr < ctx->end && isspaceW(*ctx->ptr)) {
- while(ctx->ptr < ctx->end && (isspaceW(*ctx->ptr) || *ctx->ptr
== 0xFEFF /* UTF16 BOM */)) { if(is_endline(*ctx->ptr++)) ctx->nl = TRUE; }
This looks correct according to ECMA-252 section 7.2 - all of the following is a whitespace:
- tab and vertical tab, 0x9 and 0xb;
- form feed 0xc
- space 0x20
- NBSP 0xa0
- UTF-16 BOM 0xfeff
- any other Unicode "space separator"
Hopefully isspaceW() covers everything but the BOM. What worries me is that isspaceW() itself is used in numerous places in code on its own. So probably we need more tests to cover more cases where space separators could be used, and later have our own is_space() call that will conform to the standard.
FWIW, ECMA-262 (which I usually use for jscript development) doesn't mention UTF-16 as white space.
Sorry, 252 was a typo of course. It does mention it here: http://www.ecma-international.org/ecma-262/5.1/#sec-7.2
Anyway, I agree that it would be interesting to see if it's considered white space in other places as well. (I'm also fine with the patch in current form, but an extended version would be obviously better).
Sure, I'm not saying it's wrong either, just pointing out a potential direction for improvement.
Jacek
Hi Nikolay,
Thanks for commenting, your thought is helpful as usual :)
I added tests for more space separators, also added tests for parseFloat(): https://testbot.winehq.org/JobDetails.pl?Key=9203&log_206=1#k206
The test result show that, older version of jscript doesn't flow the standard, while newer version is much better. The test result also show that, BOM and other space separators are also ignored in functions like parseFloat. I'm planing on adding tests for the following group of functions.
parseInt() eval() isNaN() isFinite() new Number() new Function()
new Date() Date.parse()
RegExp()
My idea is fixing the code in lex.c first, then fixing other place using isspaceW in an incremental way with tests provided, like the attached patches demos.
Currently there are two troubles bothering me: 1. We need a good way to skip space tests on older IEs 2. We need a good way to reduce the length of WCHAR array for the jscript code, I believe most of us hate a long array like below...
{'v','a','r',' ','a',' ','=',' ','p','a','r','s','e','F','l','o','a','t','(','"',SPACE_HOLDER,'3','.','1','4',SPACE_HOLDER,'"',')',';','o','k','(','a','=','=','3','.','1','4',',',' ','"','h','a','h','a','"',')',';','r','e','p','o','r','t','S','u','c','c','e','s','s','(',')',';','\0'}
Do you have any advice?
Thanks!
The patch has got in, but if you like, I can proposal a new version like the attached patches. (Need rebasing on latest Wine).
On Thu, Oct 2, 2014 at 7:26 PM, Qian Hong fracting@gmail.com wrote:
Hi Nikolay,
Thanks for commenting, your thought is helpful as usual :)
I added tests for more space separators, also added tests for parseFloat(): https://testbot.winehq.org/JobDetails.pl?Key=9203&log_206=1#k206
The test result show that, older version of jscript doesn't flow the standard, while newer version is much better. The test result also show that, BOM and other space separators are also ignored in functions like parseFloat. I'm planing on adding tests for the following group of functions.
parseInt() eval() isNaN() isFinite() new Number() new Function()
new Date() Date.parse()
RegExp()
My idea is fixing the code in lex.c first, then fixing other place using isspaceW in an incremental way with tests provided, like the attached patches demos.
Currently there are two troubles bothering me:
- We need a good way to skip space tests on older IEs
- We need a good way to reduce the length of WCHAR array for the
jscript code, I believe most of us hate a long array like below...
{'v','a','r',' ','a',' ','=',' ','p','a','r','s','e','F','l','o','a','t','(','"',SPACE_HOLDER,'3','.','1','4',SPACE_HOLDER,'"',')',';','o','k','(','a','=','=','3','.','1','4',',',' ','"','h','a','h','a','"',')',';','r','e','p','o','r','t','S','u','c','c','e','s','s','(',')',';','\0'}
Do you have any advice?
Thanks!
Hi Qian,
On 10/02/14 23:39, Qian Hong wrote:
The patch has got in, but if you like, I can proposal a new version like the attached patches. (Need rebasing on latest Wine).
If we're going to have more such tests, then I think a more generic way to write them would be nice. How about adding a new source file (say spaces.js) that would be preprocessed looking for a special sequence (like #\XXXX#) that would be replaced by WCHAR value?
Jacek