I'm trying to figure out why CompareStringA returns CSTR_EQUAL for the strings "\1" and "\2". (See bug 5469, and the todo_wine test case in dlls/kernel/tests/locale.c)
CompareStringA does the usual thing, calls MultiByteToWideChar and calls CompareStringW. So CompareStringW is comparing L"\0001" to L"\0002".
CompareStringW calls wine_compare_string, in libs/unicode/sortkey.c That calls compare_unicode_weights. That has this little bit of code: ce1 = collation_table[collation_table[*str1 >> 8] + (*str1 & 0xff)]; ce2 = collation_table[collation_table[*str2 >> 8] + (*str2 & 0xff)];
With the strings L"\0001" and L"\0002", *str1 is 0x0001, and *str2 is 0x0002. So *str1 >> 8 is 0, and *str2 >> 8 is 0. *str1 & 0xff is 0x01, *str2 & 0xff is 0x02. So, ce1 == collation_table[1], which is 0x00000300 (in collation.c), and ce2 == collation_table[2], which is 0x00000400.
That gets us here: if (ce1 != (unsigned int)-1 && ce2 != (unsigned int)-1) ret = (ce1 >> 16) - (ce2 >> 16); else ret = *str1 - *str2;
Well, 0x00000300 >> 16 is 0, and so is 0x00000400, so ce1 - ce2 is 0, so these strings are considered equal. But as the test case shows, they're not supposed to be.
I'm just not sure what to do about it. Changing collation.c isn't really an option, since it's generated. So there's some flaw in the logic here, but I don't understand the meaning of collation_table. Could someone explain to me what it is?
Thanks, --Juan
----- Original Message ----- From: "Juan Lang" juan_lang@yahoo.com To: wine-devel@winehq.org Sent: Wednesday, June 28, 2006 12:20 AM Subject: Debugging string comparison problem
I'm trying to figure out why CompareStringA returns CSTR_EQUAL for the strings "\1" and "\2". (See bug 5469, and the todo_wine test case in dlls/kernel/tests/locale.c)
CompareStringA does the usual thing, calls MultiByteToWideChar and calls CompareStringW. So CompareStringW is comparing L"\0001" to L"\0002".
CompareStringW calls wine_compare_string, in libs/unicode/sortkey.c That calls compare_unicode_weights. That has this little bit of code: ce1 = collation_table[collation_table[*str1 >> 8] + (*str1 & 0xff)]; ce2 = collation_table[collation_table[*str2 >> 8] + (*str2 & 0xff)];
With the strings L"\0001" and L"\0002", *str1 is 0x0001, and *str2 is 0x0002. So *str1 >> 8 is 0, and *str2 >> 8 is 0. *str1 & 0xff is 0x01, *str2 & 0xff is 0x02. So, ce1 == collation_table[1], which is 0x00000300 (in collation.c), and ce2 == collation_table[2], which is 0x00000400.
That gets us here: if (ce1 != (unsigned int)-1 && ce2 != (unsigned int)-1) ret = (ce1 >> 16) - (ce2 >> 16); else ret = *str1 - *str2;
Well, 0x00000300 >> 16 is 0, and so is 0x00000400, so ce1 - ce2 is 0, so these strings are considered equal. But as the test case shows, they're not supposed to be.
I'm just not sure what to do about it. Changing collation.c isn't really an option, since it's generated. So there's some flaw in the logic here, but I don't understand the meaning of collation_table. Could someone explain to me what it is?
That's really a problem with collation.c, or rather with the it's been generated from www.unicode.org/reports/tr10/allkeys.txt. There are a lot of differences between that file and microsoft's implementation. We have some hacks in Crossover to compensate it, and to do so what I did is just fixed up the allkeys.txt from unicode.org and regenerated collation.c.
Juan Lang wrote:
I'm trying to figure out why CompareStringA returns CSTR_EQUAL for the strings "\1" and "\2". (See bug 5469, and the todo_wine test case in dlls/kernel/tests/locale.c)
CompareStringA does the usual thing, calls MultiByteToWideChar and calls CompareStringW. So CompareStringW is comparing L"\0001" to L"\0002".
CompareStringW calls wine_compare_string, in libs/unicode/sortkey.c That calls compare_unicode_weights. That has this little bit of code: ce1 = collation_table[collation_table[*str1 >> 8] + (*str1 & 0xff)]; ce2 = collation_table[collation_table[*str2 >> 8] + (*str2 & 0xff)];
With the strings L"\0001" and L"\0002", *str1 is 0x0001, and *str2 is 0x0002. So *str1 >> 8 is 0, and *str2 >> 8 is 0. *str1 & 0xff is 0x01, *str2 & 0xff is 0x02. So, ce1 == collation_table[1], which is 0x00000300 (in collation.c), and ce2 == collation_table[2], which is 0x00000400.
You missed the two collation_table lookups. The first lookup is to find an index (the table is stored with some trivial compression). This will be 0x200 for both *str1 and *str2. Then the second lookup is done for collation_table[0x201] and collation_table[0x202] and these are both 0 (see the data beginning with line /* 0x0000 .. 0x00ff */ in collation.c).
Note that on Windows using CompareString on L"\0001\0002" and L"\0002\0001" gives a result of CSTR_EQUAL, so I don't think the bug is in the collation tables.
You missed the two collation_table lookups.
You're right, I did miss that.
Note that on Windows using CompareString on L"\0001\0002" and L"\0002\0001" gives a result of CSTR_EQUAL, so I don't think the bug is in the collation tables.
Really? For which locale, and which version of Windows? For US English, on WinXP, it returns CSTR_LESS_THAN for me. Here's a quick proggie:
LCID lcid = MAKELCID(MAKELANGID(LANG_ENGLISH, SUBLANG_ENGLISH_US), SORT_DEFAULT); BSTR str1 = SysAllocStringLen(L"\0001\0002", 2); BSTR str2 = SysAllocStringLen(L"\0002\0001", 2);
printf("VarBstrCmp returns %ld\n", VarBstrCmp(str1, str2, lcid, 0)); printf("CompareStringW returns %d\n", CompareStringW(lcid, 0, str1, 2, str2, 2));
The output is: VarBstrCmp returns 0 CompareStringW returns 1
--Juan
Juan Lang wrote:
You missed the two collation_table lookups.
You're right, I did miss that.
Note that on Windows using CompareString on L"\0001\0002" and L"\0002\0001" gives a result of CSTR_EQUAL, so I don't think the bug is in the collation tables.
Really? For which locale, and which version of Windows? For US English, on WinXP, it returns CSTR_LESS_THAN for me. Here's a quick proggie:
LCID lcid = MAKELCID(MAKELANGID(LANG_ENGLISH, SUBLANG_ENGLISH_US), SORT_DEFAULT); BSTR str1 = SysAllocStringLen(L"\0001\0002", 2); BSTR str2 = SysAllocStringLen(L"\0002\0001", 2);
I think the bug is here. Using SysAllocString instead of SysAllocStringLen I get:
VarBstrCmp returns 1 CompareStringW returns 2
printf("VarBstrCmp returns %ld\n", VarBstrCmp(str1, str2, lcid, 0)); printf("CompareStringW returns %d\n", CompareStringW(lcid, 0, str1, 2, str2, 2));
The output is: VarBstrCmp returns 0 CompareStringW returns 1
I think the bug is here. Using SysAllocString instead of SysAllocStringLen I get:
VarBstrCmp returns 1 CompareStringW returns 2
Using SysAllocString, I get that the length of each string is 0. So yeah, they'd be equal.
So, I'm back to where I started: in Windows, these are different characters, but in Wine they're not. How do I fix it?
--Juan