http://bugs.winehq.org/show_bug.cgi?id=9583
Summary: CompareStringW gives incorrect result for some wide strings Product: Wine Version: CVS/GIT Platform: Other OS/Version: other Status: UNCONFIRMED Severity: enhancement Priority: P2 Component: wine-kernel AssignedTo: wine-bugs@winehq.org ReportedBy: peter@cendio.se
It seems like CompareStringW does not work correctly, when using wide/unicode strings. I first discovered this when I saw that listbox controls sorted swedish characters (say, LATIN SMALL LETTER A WITH RING ABOVE) side by side with the base character (LATIN SMALL LETTER A). An example is http://www.cendio.se/~astrand/wine/40-listbox-sort/list.exe.
The cause seems to be that CompareStringW gives the wrong result. I've created the small test example http://www.cendio.se/~astrand/wine/40-listbox-sort/comparestring.exe, with source available in the same directory. On Windows, the result is that the string "äpple" is greater than "orange", which is correct. With the latest Wine CVS version, the result is that "äpple" is lesser than "orange", which is wrong.
http://bugs.winehq.org/show_bug.cgi?id=9583
Dmitry Timoshkov dmitry@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Version|CVS/GIT |0.9.44.
--- Comment #1 from Dmitry Timoshkov dmitry@codeweavers.com 2007-09-04 10:24:40 --- What is your locale under Windows? Could you run your test with Windows locale set to English and see if that changes the result?
(Note: the tests above are not Wine related)
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #2 from Peter Åstrand peter@cendio.se 2007-09-05 02:21:11 --- My locale is 0x41d. If I change the locale in Windows to English (USA) (0x409), it changes the result, the string is now compared less instead. Here's a log:
H:>comparestring locale is 0x41d string1 is greater
H:>comparestring locale is 0x409 string1 is less
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #3 from Dmitry Timoshkov dmitry@codeweavers.com 2007-09-05 02:31:05 --- This means that Windows uses different sort weight tables depending on locale, and that's not really a unicode sort (which is locale agnostic).
Does calling lstrcmpW show the same result?
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #4 from Dmitry Timoshkov dmitry@codeweavers.com 2007-09-05 02:52:38 --- Also it would be interesting to see what sort keys LCMapStringW returns in both locales and where they differ.
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #5 from Dmitry Timoshkov dmitry@codeweavers.com 2007-09-05 03:04:31 --- Another point that may make the testing easier: most likely just specifying an appropriate locale (0x41d or 0x409) in the 1st argument of CompareStringW and LCMapStringW is enough to avoid system locale changes.
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #6 from Peter Åstrand peter@cendio.se 2007-09-05 05:50:52 ---
Does calling lstrcmpW show the same result?
Yes. With locale 0x41d, under Windows, both gives "greater", while under Wine, both gives "less".
Also it would be interesting to see what sort keys LCMapStringW returns in both locales and where they differ.
I'm not familiar with LCMapStringW. What kind of flags should I call it with?
Another point that may make the testing easier: most likely just specifying an appropriate locale (0x41d or 0x409) in the 1st argument of CompareStringW and LCMapStringW is enough to avoid system locale changes.
Since the test program prints out the locale at startup, there's no risk of mistakes. Using GetUserDefaultLCID() is convenient, since it allows us to test different locale just by changing the locale in the control panel, instead of having to recompile.
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #7 from Dmitry Timoshkov dmitry@codeweavers.com 2007-09-05 05:57:42 ---
I'm not familiar with LCMapStringW. What kind of flags should I call it with?
LCMAP_SORTKEY
Another point that may make the testing easier: most likely just specifying an appropriate locale (0x41d or 0x409) in the 1st argument of CompareStringW and LCMapStringW is enough to avoid system locale changes.
Since the test program prints out the locale at startup, there's no risk of mistakes. Using GetUserDefaultLCID() is convenient, since it allows us to test different locale just by changing the locale in the control panel, instead of having to recompile.
Passing explicit locale to CompareStringW and LCMapStringW makes sure that you are testing real and not implicit behaviour. Also having a common testing function that takes a locale id as a parameter helps to avoid recompilations.
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #8 from Peter Åstrand peter@cendio.se 2007-09-05 07:37:10 ---
LCMAP_SORTKEY
Ok, I've tried calling LCMapStringW now, in both locales, but I don't really know how to interpret the result:
locale is 0x41d CompareStringW: string1 is greater lstrcmpW: string1 is greater LCMapStringW result: 15 0xe 0xaf 0xe 0x7e 0xe 0x7e 0xe 0x48 0xe 0x21 0x1 0x1 0x1 0x1 0x0
locale is 0x409 CompareStringW: string1 is less lstrcmpW: string1 is less LCMapStringW result: 16 0xe 0x2 0xe 0x7e 0xe 0x7e 0xe 0x48 0xe 0x21 0x1 0x13 0x1 0x1 0x1 0x0
An updated source file is available on http://www.cendio.se/~astrand/wine/40-listbox-sort/.
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #9 from Dmitry Timoshkov dmitry@codeweavers.com 2007-09-05 23:12:11 ---
Ok, I've tried calling LCMapStringW now, in both locales, but I don't really know how to interpret the result:
In order to make the interpretation of the result easier it would help to call LCMapStringW for each character, and print original unicode code point together with resulting sort key.
An updated source file is available on http://www.cendio.se/~astrand/wine/40-listbox-sort/.
Looks like comparestring.cpp provided there isn't really the latest source. Also please attach the tests here in the bug.
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #10 from Peter Åstrand peter@cendio.se 2007-09-06 03:36:24 ---
In order to make the interpretation of the result easier it would help to call LCMapStringW for each character, and print original unicode code point together with resulting sort key.
Perhaps it would be easier if you modify the test program yourself, the way you want it?
Looks like comparestring.cpp provided there isn't really the latest source.
Sorry about that, this should be fixed now.
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #11 from Dmitry Timoshkov dmitry@codeweavers.com 2007-09-06 03:43:38 --- I have no time at the moment to work on this, so if you feel like motivated enough I'd prefer just provide additional help if needed.
First we need to understand what is the difference in behaviour, and how locale (and which locale) affects it. Once investigated we will need to start thinking about possible solutions.
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #12 from Peter Åstrand peter@cendio.se 2007-09-06 07:13:22 --- Ok, I've extend the test program now, to call LCMapStringW for each character. Updated source and binary in the usual location. I've also tried running it both under Wine and Windows, bith both US and Swedish locale. Here's the result:
Windows -------
locale is 0x409 CompareStringW: string1 is less lstrcmpW: string1 is less LCMapStringW result: 16 0xe 0x2 0xe 0x7e 0xe 0x7e 0xe 0x48 0xe 0x21 0x1 0x13 0x1 0x1 0x1 0x0
char: 0xe4, sortkeys: 0xe 0x2 0x1 0x13 0x1 0x1 0x1 0x0 char: 0x70, sortkeys: 0xe 0x7e 0x1 0x1 0x1 0x1 0x0 char: 0x70, sortkeys: 0xe 0x7e 0x1 0x1 0x1 0x1 0x0 char: 0x6c, sortkeys: 0xe 0x48 0x1 0x1 0x1 0x1 0x0 char: 0x65, sortkeys: 0xe 0x21 0x1 0x1 0x1 0x1 0x0 char: 0x0, sortkeys: 0x1 0x1 0x1 0x1 0x0
locale is 0x41d CompareStringW: string1 is greater lstrcmpW: string1 is greater LCMapStringW result: 15 0xe 0xaf 0xe 0x7e 0xe 0x7e 0xe 0x48 0xe 0x21 0x1 0x1 0x1 0x1 0x0
char: 0xe4, sortkeys: 0xe 0xaf 0x1 0x1 0x1 0x1 0x0 char: 0x70, sortkeys: 0xe 0x7e 0x1 0x1 0x1 0x1 0x0 char: 0x70, sortkeys: 0xe 0x7e 0x1 0x1 0x1 0x1 0x0 char: 0x6c, sortkeys: 0xe 0x48 0x1 0x1 0x1 0x1 0x0 char: 0x65, sortkeys: 0xe 0x21 0x1 0x1 0x1 0x1 0x0 char: 0x0, sortkeys: 0x1 0x1 0x1 0x1 0x0
Wine ----
locale is 0x409 CompareStringW: string1 is less lstrcmpW: string1 is less LCMapStringW result: 29 0xa 0x15 0xb 0x67 0xb 0x67 0xb 0x3 0xa 0x65 0x1 0x2 0x2 0x2 0x2 0x2 0x1 0x2 0x2 0x2 0x2 0x2 0x1 0xe4 0x70 0x70 0x6c 0x65 0x1
char: 0xe4, sortkeys: 0xa 0x15 0x1 0x2 0x1 0x2 0x1 0xe4 0x1 char: 0x70, sortkeys: 0xb 0x67 0x1 0x2 0x1 0x2 0x1 0x70 0x1 char: 0x70, sortkeys: 0xb 0x67 0x1 0x2 0x1 0x2 0x1 0x70 0x1 char: 0x6c, sortkeys: 0xb 0x3 0x1 0x2 0x1 0x2 0x1 0x6c 0x1 char: 0x65, sortkeys: 0xa 0x65 0x1 0x2 0x1 0x2 0x1 0x65 0x1 char: 0x0, sortkeys: 0x1 0x1 0x1 0x1
locale is 0x41d CompareStringW: string1 is less lstrcmpW: string1 is less LCMapStringW result: 29 0xa 0x15 0xb 0x67 0xb 0x67 0xb 0x3 0xa 0x65 0x1 0x2 0x2 0x2 0x2 0x2 0x1 0x2 0x2 0x2 0x2 0x2 0x1 0xe4 0x70 0x70 0x6c 0x65 0x1
char: 0xe4, sortkeys: 0xa 0x15 0x1 0x2 0x1 0x2 0x1 0xe4 0x1 char: 0x70, sortkeys: 0xb 0x67 0x1 0x2 0x1 0x2 0x1 0x70 0x1 char: 0x70, sortkeys: 0xb 0x67 0x1 0x2 0x1 0x2 0x1 0x70 0x1 char: 0x6c, sortkeys: 0xb 0x3 0x1 0x2 0x1 0x2 0x1 0x6c 0x1 char: 0x65, sortkeys: 0xa 0x65 0x1 0x2 0x1 0x2 0x1 0x65 0x1 char: 0x0, sortkeys: 0x1 0x1 0x1 0x1
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #13 from Austin English austinenglish@gmail.com 2008-06-12 11:16:23 --- Is this still an issue in current (1.0-rc4 or newer) wine?
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #14 from Peter Åstrand peter@cendio.se 2008-06-13 01:32:36 --- Yes. Tested with the latest CVS version (2007-06-12).
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #15 from Austin English austinenglish@gmail.com 2008-12-24 10:06:13 --- Created an attachment (id=18173) --> (http://bugs.winehq.org/attachment.cgi?id=18173) output under wine in git
http://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #16 from Austin English austinenglish@gmail.com 2008-12-24 10:08:37 --- Created an attachment (id=18174) --> (http://bugs.winehq.org/attachment.cgi?id=18174) output under windows 2000
http://bugs.winehq.org/show_bug.cgi?id=9583
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Keywords| |download, source
--- Comment #17 from Austin English austinenglish@gmail.com 2008-12-24 10:09:24 --- http://www.cendio.com/~astrand/wine/40-listbox-sort/comparestring.cpp
Still present in git. I used the comparestring.exe.
http://bugs.winehq.org/show_bug.cgi?id=9583
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |testcase
http://bugs.winehq.org/show_bug.cgi?id=9583
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- URL| |http://www.cendio.se/~astra | |nd/wine/40-listbox-sort/com | |parestring.exe
--- Comment #18 from Austin English austinenglish@gmail.com 2010-06-13 04:49:18 --- Still present in 1.2-rc3.
http://bugs.winehq.org/show_bug.cgi?id=9583
Julian Rüger jr98@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jr98@gmx.net
https://bugs.winehq.org/show_bug.cgi?id=9583
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- URL|http://www.cendio.se/~astra |http://www.cendio.com/~astr |nd/wine/40-listbox-sort/com |and/wine/40-listbox-sort/co |parestring.exe |mparestring.exe
--- Comment #19 from Austin English austinenglish@gmail.com --- austin@aw25 ~ $ wine comparestring.exe locale is 0x409 CompareStringW: string1 is less lstrcmpW: string1 is less LCMapStringW result: 30 0xa 0x15 0xb 0x67 0xb 0x67 0xb 0x3 0xa 0x65 0x1 0x2 0x2 0x2 0x2 0x2 0x1 0x2 0x2 0x2 0x2 0x2 0x1 0xe4 0x70 0x70 0x6c 0x65 0x1 0x0
char: 0xe4, sortkeys: 0xa 0x15 0x1 0x2 0x1 0x2 0x1 0xe4 0x1 0x0 char: 0x70, sortkeys: 0xb 0x67 0x1 0x2 0x1 0x2 0x1 0x70 0x1 0x0 char: 0x70, sortkeys: 0xb 0x67 0x1 0x2 0x1 0x2 0x1 0x70 0x1 0x0 char: 0x6c, sortkeys: 0xb 0x3 0x1 0x2 0x1 0x2 0x1 0x6c 0x1 0x0 char: 0x65, sortkeys: 0xa 0x65 0x1 0x2 0x1 0x2 0x1 0x65 0x1 0x0 char: 0x0, sortkeys: 0x1 0x1 0x1 0x1 0x0 austin@aw25 ~ $ wine --version wine-1.7.12
still present.
https://bugs.winehq.org/show_bug.cgi?id=9583
Virgo Pärna virgo@gaiasoft.ee changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |virgo@gaiasoft.ee
--- Comment #20 from Virgo Pärna virgo@gaiasoft.ee --- Problem affects also CompareStringA. For testing I made Delphi 5 executable with: CompareStringA(1061, 0, PChar('AA'), 2, PCHar('ÄÄ'), 2) comparision. And it returns 1 in "Windows 7", but 2 in wine 1.4.1. It seems, that wine does that comparsion Accent Insensitive way, which is incorrect. And I suspect, that base issue is same with Ansi and Wide version. And this is major issue, because any sorting or character comparision using those functions will work incorrectly.
https://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #21 from Virgo Pärna virgo@gaiasoft.ee --- Problem seems to be, that call to CompareStringEx passes NULL as Locale Name. Which actually should be LOCALE_NAME_USER_DEFAULT, so in the case, when GetUserDefaultLCID returns same value, it should work correctly. So it should use call to LCIDToLocaleName to get proper locale name for passing to CompareStringEx
https://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #22 from Virgo Pärna virgo@gaiasoft.ee --- I'm not a C programmer, but as far as can see: Any call to CompareString or CompareStringEx ends up calling wine_compare_string. And at that moment any locale information passed together with CompareString or CompareStringEx call (as lcid or locale parameter) is lost. wine_comparer_string ends up calling compare_unicode_weights. And CompareString/CompareStringEx end up reporting strings equal only if that call reports them equal (returns 0). And compare_unicode_weights uses collation_table. And that part of the code is little bit too complex to me. But it appears that collation_table is compiled in and is incorrect for this use. Someone, that is better at C, can correct me, if I misunderstood something. CompareStringEx is in locale.c wine_compare_string and compare_unicode_weights are in sortkey.c collation_table is in collation.c
https://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #23 from Virgo Pärna virgo@gaiasoft.ee --- This is starting to get quite complex. If I understand correctly, then compare_unicode_weights is supposed to give same result for A and Ä. But then compare_diacritic_weights should show the difference. Just for testing I extracted relavent parts from wine and I made a test program to see, what happens. And it seems that those lines: ce1 = collation_table[collation_table[*str1 >> 8] + (*str1 & 0xff)]; ce2 = collation_table[collation_table[*str2 >> 8] + (*str2 & 0xff)]; give the same result for A and Ä. And that part is exactly same in compare_unicode_weights and compare_diacritic_weights and compare_case_weights. So that would always give same results AFAIU. And I really don't understand, what part of the http://www.unicode.org/reports/tr10/allkeys.txt collation.c contains.
From the allkeys.txt
0041 ; [.0A15.0020.0008.0041] # LATIN CAPITAL LETTER A 00C1 ; [.0A15.0020.0008.0041][.0000.0032.0002.0301] # LATIN CAPITAL LETTER A WITH ACUTE; QQCM
How does this information translate to collation_table? My perl is unfortunately even worse than my C. But as far as I understand the array is generated in make_unicode script.
https://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #24 from Virgo Pärna virgo@gaiasoft.ee ---
00C1 ; [.0A15.0020.0008.0041][.0000.0032.0002.0301] # LATIN CAPITAL LETTER A WITH ACUTE; QQCM
Öfcause Ä is actually 00C4 ; [.0A15.0020.0008.0041][.0000.0047.0002.0308] # LATIN CAPITAL LETTER A WITH DIAERESIS; QQCM
Anyway, collation_table gives 0x0A150151 for both A and Ä.
https://bugs.winehq.org/show_bug.cgi?id=9583
Luke Bratch l_bratch@yahoo.co.uk changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |l_bratch@yahoo.co.uk
https://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #25 from Virgo Pärna virgo@gaiasoft.ee --- Created attachment 49623 --> https://bugs.winehq.org/attachment.cgi?id=49623 Testcase patch
Additional testcases for locale.c showing the problem. Also testbot job 9085.
https://bugs.winehq.org/show_bug.cgi?id=9583
Nikolay Sivov bunglehead@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #49623|application/mbox |text/plain mime type| |
https://bugs.winehq.org/show_bug.cgi?id=9583
Ken Sharp imwellcushtymelike@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Severity|enhancement |minor
https://bugs.winehq.org/show_bug.cgi?id=9583
zaplo00@mailfence.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |zaplo00@mailfence.com
https://bugs.winehq.org/show_bug.cgi?id=9583
Fabian Maurer dark.shadow4@web.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |dark.shadow4@web.de
--- Comment #26 from Fabian Maurer dark.shadow4@web.de --- I'm currently working on a proper solution for this. Just need to get it upstreamed, which will probably take some time still.
https://bugs.winehq.org/show_bug.cgi?id=9583
omega@online.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |omega@online.de
https://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #27 from Fabian Maurer dark.shadow4@web.de --- Essentially the same reason as bug 10767 - different collation tables. This should be fixed in staging, can you retest please? Also, should we merge those bugs?
https://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #28 from Virgo Pärna virgo@gaiasoft.ee --- Tested with custom made Delphi5 executable.
Windows: CompareStringA(1024, SORT_STRINGSORT, PChar('VAA'), 3, PChar('VÄÄ'), 3) returns 1 CompareStringA(0, SORT_STRINGSORT, PChar('VAA'), 3, PChar('VÄÄ'), 3) returns 1
wine-4.0 (Debian 4.0-2): CompareStringA(1024, SORT_STRINGSORT, PChar('VAA'), 3, PChar('VÄÄ'), 3) returns 2 CompareStringA(0, SORT_STRINGSORT, PChar('VAA'), 3, PChar('VÄÄ'), 3) returns 2
wine-6.0-101-g6c8029d8d0a CompareStringA(1024, SORT_STRINGSORT, PChar('VAA'), 3, PChar('VÄÄ'), 3) returns 1 CompareStringA(0, SORT_STRINGSORT, PChar('VAA'), 3, PChar('VÄÄ'), 3) returns 1
So it seems to be much better. Bear and wheelbarrow are no longer same word...
https://bugs.winehq.org/show_bug.cgi?id=9583
VZ vz-wine@zeitlins.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |vz-wine@zeitlins.org
https://bugs.winehq.org/show_bug.cgi?id=9583
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Fixed by SHA1| |d8c973ad95ba5e8a9a51df0dd9b | |e587950179ec3 Status|NEW |RESOLVED
--- Comment #29 from Alexandre Julliard julliard@winehq.org --- This should be fixed by d8c973ad95ba5e8a9a51df0dd9be587950179ec3.
https://bugs.winehq.org/show_bug.cgi?id=9583
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #30 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 7.10.
https://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #31 from VZ vz-wine@zeitlins.org --- Sorry, but this doesn't seem to be really fixed for me with Wine 7.0.
E.g. CompareStringEx("de-DE", NORM_IGNORECASE, "ß", "ss") yields 0 under native MSW but -1 with Wine. And CompareStringEx("sv", 0, "ä", "ae") gives -1 instead of the expected 1 for Swedish sort order.
https://bugs.winehq.org/show_bug.cgi?id=9583
--- Comment #32 from VZ vz-wine@zeitlins.org --- (In reply to VZ from comment #31)
Sorry, but this doesn't seem to be really fixed for me with Wine 7.0.
Damn, sorry for the noise, I've somehow misread 7.10 in the previous comment as 7.0 and didn't realize that my version of Wine didn't have the fix yet. Of course, I've noticed it immediately after posting the comment above. Please disregard it and sorry again.