Hi All,
I've written a regression test that shows what the undocumented flag 0x10000000 passed by shlwapi.StrIsIntlEqualW/A to CompareStringW/A does. I discovered the different by writing a short program that compared the output of CompareString with and without the flag for all unicode characters... it took a while to run :)
The flag (0x10000000) passed to CompareString reverse the sort order of a number of unicode characters. I've got no idea why it would want to do that... maybe somebody can shed some light on what the reason behind this would be?
Mike
ChangeLog: * add a test case for CompareStringW undocumented flag 0x10000000
Index: dlls/kernel/tests/locale.c =================================================================== RCS file: /home/wine/wine/dlls/kernel/tests/locale.c,v retrieving revision 1.33 diff -u -r1.33 locale.c --- dlls/kernel/tests/locale.c 21 Nov 2004 15:47:24 -0000 1.33 +++ dlls/kernel/tests/locale.c 21 Nov 2004 17:01:12 -0000 @@ -1410,6 +1410,76 @@ } #endif
+static void test_CompareString_undoc(void) +{ + WCHAR table1[] = { + 0x0651, 0x3005, 0x3031, 0x3032, 0x309d, 0x309e, 0x30fc, 0x30fd, + 0x30fe, 0xfe7c, 0xfe7d, 0xff70 + }; + WCHAR table2[] = { + 0x02b9, 0x02ba, 0x02bb, 0x02bc, 0x02bd, 0x02be, 0x02bf, 0x02c0, + 0x02c1, 0x02c2, 0x02c3, 0x02c4, 0x02c5, 0x02c8, 0x02cc, 0x02cd, + 0x02ce, 0x02cf, 0x02d1, 0x02d2, 0x02d3, 0x02d4, 0x02d5, 0x02d6, + 0x02d7, 0x02de, 0x02e4, 0x02e5, 0x02e6, 0x02e7, 0x02e8, 0x02e9, + 0x0300, 0x0301, 0x0302, 0x0303, 0x0304, 0x0305, 0x0306, 0x0307, + 0x0308, 0x0309, 0x030a, 0x030b, 0x030c, 0x030d, 0x030e, 0x030f, + 0x0310, 0x0311, 0x0312, 0x0313, 0x0314, 0x0315, 0x0316, 0x0317, + 0x0318, 0x0319, 0x031a, 0x031b, 0x031c, 0x031d, 0x031e, 0x031f, + 0x0320, 0x0321, 0x0322, 0x0323, 0x0324, 0x0325, 0x0326, 0x0327, + 0x0328, 0x0329, 0x032a, 0x032b, 0x032c, 0x032d, 0x032e, 0x032f, + 0x0330, 0x0331, 0x0332, 0x0333, 0x0334, 0x0335, 0x0336, 0x0337, + 0x0338, 0x0339, 0x033a, 0x033b, 0x033c, 0x033d, 0x033e, 0x033f, + 0x0340, 0x0341, 0x0342, 0x0343, 0x0344, 0x0345, 0x0346, 0x0347, + 0x0348, 0x0370, 0x0371, 0x0372, 0x0483, 0x0484, 0x0485, 0x0486, + 0x0559, 0x055a, 0x0591, 0x0592, 0x0593, 0x0594, 0x0595, 0x0596, + 0x0597, 0x0598, 0x0599, 0x059a, 0x059b, 0x059c, 0x059d, 0x059e, + 0x059f, 0x05a0, 0x05a1, 0x05a2, 0x05a3, 0x05a4, 0x05a5, 0x05a6, + 0x05a7, 0x05a8, 0x05a9, 0x05aa, 0x05ab, 0x05ac, 0x05ad, 0x05ae, + 0x05af, 0x05b0, 0x05b1, 0x05b2, 0x05b3, 0x05b4, 0x05b5, 0x05b6, + 0x05b7, 0x05b8, 0x05b9, 0x05ba, 0x05bb, 0x05bc, 0x05bd, 0x05bf, + 0x05c0, 0x05c1, 0x05c2, 0x093c, 0x0951, 0x0952, 0x0953, 0x0954, + 0x0981, 0x09bc, 0x09c1, 0x09c2, 0x09c3, 0x09c4, 0x09cd, 0x09e2, + 0x09e3, 0x0a02, 0x0a3c, 0x0a41, 0x0a42, 0x0a47, 0x0a48, 0x0a4b, + 0x0a4c, 0x0a70, 0x0a71, 0x0a81, 0x0a82, 0x0abc, 0x0ac1, 0x0ac2, + 0x0ac3, 0x0ac4, 0x0ac5, 0x0ac7, 0x0ac8, 0x0acd, 0x0b01, 0x0b3c, + 0x0b3f, 0x0b41, 0x0b42, 0x0b43, 0x0b4d, 0x0bcd, 0x0c3e, 0x0c3f, + 0x0c40, 0x0c46, 0x0c47, 0x0c48, 0x0c4a, 0x0c4b, 0x0c4c, 0x0c4d, + 0x0c55, 0x0c56, 0x0cbf, 0x0cc6, 0x0ccd, 0x0d41, 0x0d42, 0x0d43, + 0x0d4d, 0x0e47, 0x0e48, 0x0e49, 0x0e4a, 0x0e4b, 0x0e4c, 0x0e4d, + 0x0eb1, 0x0eb4, 0x0eb5, 0x0eb6, 0x0eb7, 0x0eb8, 0x0eb9, 0x0ebb, + 0x0ebc, 0x0ec8, 0x0ec9, 0x0eca, 0x0ecb, 0x0ecc, 0x0ecd, 0x1026, + 0x1027, 0x1028, 0x1029, 0x102a, 0x102e, 0x1030, 0x1036, 0x1037, + 0x103b, 0x103d, 0x103e, 0x104b, 0x104c, 0x20d0, 0x20d1, 0x20d2, + 0x20d3, 0x20d4, 0x20d5, 0x20d6, 0x20d7, 0x20d8, 0x20d9, 0x20da, + 0x20db, 0x20dc, 0x20dd, 0x20de, 0x20df, 0x20e0, 0x20e1, 0x302a, + 0x302b, 0x302c, 0x302d, 0x302e, 0x302f, 0x3099, 0x309a, 0x309b, + 0x309c, 0xfb1e, 0xff9e, 0xff9f + }; + DWORD r,i,j; + WCHAR x[2],y[2]; + BOOL bPass = TRUE; + + x[1]=0; + y[1]=0; + for(i=0; bPass && (i<sizeof table1/sizeof table1[0]); i++) + { + x[0] = table1[i]; + for(j=0; bPass && (j<sizeof table2/sizeof table2[0]); j++ ) + { + y[0] = table2[2]; + r = CompareStringW(LOCALE_SYSTEM_DEFAULT, 0x10000000, x, 2, y, 2 ); + if (r!=CSTR_GREATER_THAN) + bPass = FALSE; + r = CompareStringW(LOCALE_SYSTEM_DEFAULT, 0, x, 2, y, 2 ); + if (r==CSTR_LESS_THAN) + bPass = FALSE; + } + } + todo_wine { + ok(bPass,"undocumented flag 0x10000000 test failed\n"); + } +} + static void test_FoldStringA(void) { int ret, i; @@ -2103,6 +2173,7 @@ test_GetCurrencyFormatA(); /* Also tests the W version */ test_GetNumberFormatA(); /* Also tests the W version */ test_CompareStringA(); + test_CompareString_undoc(); test_LCMapStringA(); test_LCMapStringW(); test_FoldStringA();
"Mike McCormack" mike@codeweavers.com wrote:
The flag (0x10000000) passed to CompareString reverse the sort order of a number of unicode characters. I've got no idea why it would want to do that... maybe somebody can shed some light on what the reason behind this would be?
Just a shot in the dark: perhaps the flag is supposed to force CompareString to make character reordering first (taking into account bidirectional layout) and only then do an actual string comparison? Perhaps adding GetCharacterPlacement with GCP_REORDER flag set and comparing the results would tell more.
Dmitry Timoshkov wrote:
"Mike McCormack" mike@codeweavers.com wrote:
The flag (0x10000000) passed to CompareString reverse the sort order of a number of unicode characters. I've got no idea why it would want to do that... maybe somebody can shed some light on what the reason behind this would be?
Just a shot in the dark: perhaps the flag is supposed to force CompareString to make character reordering first (taking into account bidirectional layout) and only then do an actual string comparison?
A. BiDi strings are compared in logical order. Reordering is just about the last thing you do before display. B. This changes the greater than/less than semantics, not the order.
Then again, this does not make sense any way you turn it. The regression test shows that table 1 is lower than table 2 with the flag, higher without. Let's look at it: table 1 has three characters in various forms. These are Arabic "Shadda", and two CJK marks (prolonged sound mark and iteration and sound iteration marks). table 2 has quite a bit of characters. Taking the range I know well (Hebrew - U0590-U05FF), it has all the diacritics and "Ta'amim" marks.
This makes no sense. I even asked on the Arabeyes project's IRC channel. Comparing Arabic and Hebrew is totally meaningless. The other characters in table 2 belong to the following languages: Spacing modifier (U02b0-U02ff), Combining diacritical marks (U0300-U036f), Greek (three characters, all marked as "reserved"), Cyrillic, Devanagari, Bengali, Gurmukhi, and that's where I gave up on looking up the names of languages I didn't even know existed. I will mention "Combining marks for symbols", though, which I think is crucial to understanding this.
I will also mention "CJK Symbols and Punctuation", range U3000-U303F, and the Hiragana range U3040-U309F.
Now, here's the thing. ALL the symbols in table 2 are diacritic or punctuation symbols written either below or above the letter. They are combining marks which do not change the letter's width. On the other hand, Shadda in Arabic means to double the pronunciation of the character it combines with. In other words, this undocumented flag means that letters that are doubled in Arabic should come after other languages' diacritics. It's still apples and oranges, but maybe we have a clue as to "why". Does anyone here know what the CJK marks mean?
Out of interest, could it be that U0A01 needs to be added to table 2? If so, we may have a solution to what this flag means. Mike, can you test it out?
Shachar
Shachar Shemesh wrote:
Dmitry Timoshkov wrote:
"Mike McCormack" mike@codeweavers.com wrote:
The flag (0x10000000) passed to CompareString reverse the sort order of a number of unicode characters. I've got no idea why it would want to do that... maybe somebody can shed some light on what the reason behind this would be?
Attached is a small test case that demonstrates a problem in Wine's compare string. Explanation: The word "Mohammad" is spelled in Arabic Meem, Hah, Meem, Shadda, Dal. The Shadda's purpose is to double the pronunciation of the preceding letter, which is the reason it's written with a double M in English. The attached program uses "CompareStringW" to compare the proper spelling of "Mohammad" to three variations. One is the version without the Shadda at all (as Mohammad is usually spelled by modern Arabic writers), one is Mohammad with the Meem explicitly doubled, and one is the same as the last one, where the second Meem is replaced with the preceding letter (Lam). They are noted as "mohamad", "mohammad" and "mohamlad" respectively. The one with proper spelling is noted as "moham_ad".
moham_ad is compared against all three with no flags, with the "NORM_IGNORENONSPACE" flag, and with our unidentified flag. The program was run on Windows 2000 and on Wine.
On W2K, moham_ad is sorted between mohamlad and mohammad. If IGNORENONSPACE is used, it is exactly equal to mohammad. One wine, moham_ad is less than all three. This means we have a bug in Wine.
Shachar