Re: [PATCH] jscript: Ignore BOM mark in next_token. (try 5)

2 Oct 2014

      On 10/2/2014 14:48, Jacek Caban wrote:
...
On 10/02/14 08:29, Nikolay Sivov wrote:
...
...
static BOOL skip_spaces(parser_ctx_t *ctx)
   {

while(ctx->ptr < ctx->end && isspaceW(*ctx->ptr)) {

while(ctx->ptr < ctx->end && (isspaceW(*ctx->ptr) || *ctx->ptr

== 0xFEFF /* UTF16 BOM */)) {
           if(is_endline(*ctx->ptr++))
               ctx->nl = TRUE;
       }
This looks correct according to ECMA-252 section 7.2 - all of the
following is a whitespace:

tab and vertical tab, 0x9 and 0xb;
form feed 0xc
space 0x20
NBSP 0xa0
UTF-16 BOM 0xfeff
any other Unicode "space separator"

Hopefully isspaceW() covers everything but the BOM. What worries me is
that isspaceW() itself is used in numerous places in code on its own.
So probably we need more tests to cover more cases where space
separators could be used, and later have our own is_space() call that
will conform to the standard.
FWIW, ECMA-262 (which I usually use for jscript development) doesn't
mention UTF-16 as white space.
Sorry, 252 was a typo of course. It does mention it here:
http://www.ecma-international.org/ecma-262/5.1/#sec-7.2
...
Anyway, I agree that it would be
interesting to see if it's considered white space in other places as
well. (I'm also fine with the patch in current form, but an extended
version would be obviously better).
Sure, I'm not saying it's wrong either, just pointing out a potential 
direction for improvement.
...
Jacek

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [PATCH] jscript: Ignore BOM mark in next_token. (try 5)