On 10/2/2014 14:48, Jacek Caban wrote:
On 10/02/14 08:29, Nikolay Sivov wrote:
static BOOL skip_spaces(parser_ctx_t *ctx) {
- while(ctx->ptr < ctx->end && isspaceW(*ctx->ptr)) {
- while(ctx->ptr < ctx->end && (isspaceW(*ctx->ptr) || *ctx->ptr
== 0xFEFF /* UTF16 BOM */)) { if(is_endline(*ctx->ptr++)) ctx->nl = TRUE; }
This looks correct according to ECMA-252 section 7.2 - all of the following is a whitespace:
- tab and vertical tab, 0x9 and 0xb;
- form feed 0xc
- space 0x20
- NBSP 0xa0
- UTF-16 BOM 0xfeff
- any other Unicode "space separator"
Hopefully isspaceW() covers everything but the BOM. What worries me is that isspaceW() itself is used in numerous places in code on its own. So probably we need more tests to cover more cases where space separators could be used, and later have our own is_space() call that will conform to the standard.
FWIW, ECMA-262 (which I usually use for jscript development) doesn't mention UTF-16 as white space.
Sorry, 252 was a typo of course. It does mention it here: http://www.ecma-international.org/ecma-262/5.1/#sec-7.2
Anyway, I agree that it would be interesting to see if it's considered white space in other places as well. (I'm also fine with the patch in current form, but an extended version would be obviously better).
Sure, I'm not saying it's wrong either, just pointing out a potential direction for improvement.
Jacek