On 13 December 2016 at 19:48, Lauri Kenttä lauri.kentta@gmail.com wrote:
@@ -2027,6 +2027,19 @@ static int wctoint(WCHAR c, int base) v = c - 'A' + 10; else if ('a' <= c && c <= 'z') v = c - 'a' + 10;
- else {
/* Unicode points that contain digits 0-9; keep this sorted! */
static const WCHAR zeros[] = {
0x660, 0x6f0, 0x966, 0x9e6, 0xa66, 0xae6, 0xb66, 0xc66, 0xce6,
0xd66, 0xe50, 0xed0, 0x1040, 0x17e0, 0x1810, 0xff10
};
int i;
for (i = 0; i < sizeof(zeros)/sizeof(zeros[0]) && c >= zeros[i]; ++i) {
if (zeros[i] <= c && c < zeros[i] + base) {
Using "base" here seems questionable. That would imply that e.g. with base 16, "\x6f2\x6fa"/"۲ۺ" would return "42". Is that really the case?
For what it's worth, note also that Wine has wine_fold_string(), which should be consistent with FoldString().
On 12/14/16 11:23, Henri Verbeet wrote:
On 13 December 2016 at 19:48, Lauri Kenttä lauri.kentta@gmail.com wrote:
@@ -2027,6 +2027,19 @@ static int wctoint(WCHAR c, int base) v = c - 'A' + 10; else if ('a' <= c && c <= 'z') v = c - 'a' + 10;
- else {
/* Unicode points that contain digits 0-9; keep this sorted! */
static const WCHAR zeros[] = {
0x660, 0x6f0, 0x966, 0x9e6, 0xa66, 0xae6, 0xb66, 0xc66, 0xce6,
0xd66, 0xe50, 0xed0, 0x1040, 0x17e0, 0x1810, 0xff10
};
int i;
for (i = 0; i < sizeof(zeros)/sizeof(zeros[0]) && c >= zeros[i]; ++i) {
if (zeros[i] <= c && c < zeros[i] + base) {
Using "base" here seems questionable. That would imply that e.g. with base 16, "\x6f2\x6fa"/"۲ۺ" would return "42". Is that really the case?
That's why I have asked for a test that shows the problem.
On 14 December 2016 at 11:34, Piotr Caban piotr.caban@gmail.com wrote:
That's why I have asked for a test that shows the problem.
Right, I wrote that before reading your reply.
On 2016-12-14 12:23, Henri Verbeet wrote:
On 13 December 2016 at 19:48, Lauri Kenttä lauri.kentta@gmail.com wrote:
@@ -2027,6 +2027,19 @@ static int wctoint(WCHAR c, int base) v = c - 'A' + 10; else if ('a' <= c && c <= 'z') v = c - 'a' + 10;
- else {
/* Unicode points that contain digits 0-9; keep this sorted!
*/
static const WCHAR zeros[] = {
0x660, 0x6f0, 0x966, 0x9e6, 0xa66, 0xae6, 0xb66, 0xc66,
0xce6,
0xd66, 0xe50, 0xed0, 0x1040, 0x17e0, 0x1810, 0xff10
};
int i;
for (i = 0; i < sizeof(zeros)/sizeof(zeros[0]) && c >=
zeros[i]; ++i) {
if (zeros[i] <= c && c < zeros[i] + base) {
Using "base" here seems questionable. That would imply that e.g. with base 16, "\x6f2\x6fa"/"۲ۺ" would return "42". Is that really the case?
My bad, I've "optimized out" the missing c < zeros[i] + 10.
For what it's worth, note also that Wine has wine_fold_string(), which should be consistent with FoldString().
I don't see what FoldString could do here.
On 12/14/2016 05:45 PM, Lauri Kenttä wrote:
On 2016-12-14 12:23, Henri Verbeet wrote:
On 13 December 2016 at 19:48, Lauri Kenttä lauri.kentta@gmail.com wrote:
@@ -2027,6 +2027,19 @@ static int wctoint(WCHAR c, int base) v = c - 'A' + 10; else if ('a' <= c && c <= 'z') v = c - 'a' + 10;
- else {
/* Unicode points that contain digits 0-9; keep this
sorted! */
static const WCHAR zeros[] = {
0x660, 0x6f0, 0x966, 0x9e6, 0xa66, 0xae6, 0xb66, 0xc66,
0xce6,
0xd66, 0xe50, 0xed0, 0x1040, 0x17e0, 0x1810, 0xff10
};
int i;
for (i = 0; i < sizeof(zeros)/sizeof(zeros[0]) && c >=
zeros[i]; ++i) {
if (zeros[i] <= c && c < zeros[i] + base) {
Using "base" here seems questionable. That would imply that e.g. with base 16, "\x6f2\x6fa"/"۲ۺ" would return "42". Is that really the case?
My bad, I've "optimized out" the missing c < zeros[i] + 10.
For what it's worth, note also that Wine has wine_fold_string(), which should be consistent with FoldString().
I don't see what FoldString could do here.
MAP_FOLDDIGITS looks relevant to what you're doing.
On 14 December 2016 at 15:54, Nikolay Sivov nsivov@codeweavers.com wrote:
On 12/14/2016 05:45 PM, Lauri Kenttä wrote:
On 2016-12-14 12:23, Henri Verbeet wrote:
For what it's worth, note also that Wine has wine_fold_string(), which should be consistent with FoldString().
I don't see what FoldString could do here.
MAP_FOLDDIGITS looks relevant to what you're doing.
Yeah. MAP_FOLDDIGITS will map the various unicode digits to 0-9. It does that based on the unicode tables, which means you wouldn't have to maintain a separate list of unicode digit ranges in msvcrt. That assumes wcstoi64() is consistent with FoldString(), which of course it may not be.
On 2016-12-14 16:59, Henri Verbeet wrote:
On 14 December 2016 at 15:54, Nikolay Sivov nsivov@codeweavers.com wrote:
On 12/14/2016 05:45 PM, Lauri Kenttä wrote:
On 2016-12-14 12:23, Henri Verbeet wrote:
For what it's worth, note also that Wine has wine_fold_string(), which should be consistent with FoldString().
I don't see what FoldString could do here.
MAP_FOLDDIGITS looks relevant to what you're doing.
Yeah. MAP_FOLDDIGITS will map the various unicode digits to 0-9. It does that based on the unicode tables, which means you wouldn't have to maintain a separate list of unicode digit ranges in msvcrt. That assumes wcstoi64() is consistent with FoldString(), which of course it may not be.
Unfortunately MAP_FOLDDIGITS seems to map too many things (e.g. Tamil, which is already tested and shouldn't work).
I've now verified the list of zeros with the obvious test program:
WCHAR buf[4] = {0}, *end; int i, nl = 0; for (buf[0] = '0'; buf[0] != 0; ++buf[0]) { end = buf; i = _wcstoi64(buf, &end, 36); if (end != buf) { printf("U+%04X = %d\n", buf[0], i); } }
There's no point in including this in my patch, though, I guess.
On 14 December 2016 at 17:27, Lauri Kenttä lauri.kentta@gmail.com wrote:
Unfortunately MAP_FOLDDIGITS seems to map too many things (e.g. Tamil, which is already tested and shouldn't work).
Does it on Windows as well? That is, our unicode tables aren't necessarily quite the same as the Microsoft ones, and the Microsoft ones aren't necessarily the same across Windows versions. Likewise, is that consistent across different versions of msvcr? I could imagine different msvcr versions matching different unicode versions.
On 2016-12-14 18:36, Henri Verbeet wrote:
On 14 December 2016 at 17:27, Lauri Kenttä lauri.kentta@gmail.com wrote:
Unfortunately MAP_FOLDDIGITS seems to map too many things (e.g. Tamil, which is already tested and shouldn't work).
Does it on Windows as well? That is, our unicode tables aren't necessarily quite the same as the Microsoft ones, and the Microsoft ones aren't necessarily the same across Windows versions. Likewise, is that consistent across different versions of msvcr? I could imagine different msvcr versions matching different unicode versions.
Testing on Windows Server 2003, I get 84 code points which wcstoi64 doesn't recognize but MAP_FOLDDIGITS manages to convert. These include superscripts, subscripts, circled numbers etc., and Tamil.
The most interesting question is, what's wrong with Tamil? :(
Maybe we could use MAP_FOLDDIGITS anyway, but bug for bug, right?
And as for different Windows versions, my wctoint seems to be fine: https://testbot.winehq.org/JobDetails.pl?Key=27358