http://bugs.winehq.org/show_bug.cgi?id=26632
Summary: MultiByteToWideChar with MB_ERR_INVALID_CHARS doesn't fail for some code points. Product: Wine Version: 1.3.17 Platform: x86 OS/Version: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: kernel32 AssignedTo: wine-bugs@winehq.org ReportedBy: sagawa.aki+winebugs@gmail.com
Created an attachment (id=33899) --> (http://bugs.winehq.org/attachment.cgi?id=33899) test MB_ERR_INVALID_CHARS
I ran the attached source code in both Wine and Windows XP environments. In some codepages, inclueding Japanese (CP932), the result doesn't match.
For instance, Japanese Windows marks `X' (conversion fail) for 0xA0, 0xFD, 0xFE and 0xFF. But Wine (LANG=ja_JP.UTF-8) marks `o' (OK) for them. This only happens when I pass MB_ERR_INVALID_CHARS for MultiByteToWideChars.
This article might be help you. http://blogs.msdn.com/b/michkap/archive/2007/07/25/4037646.aspx
http://bugs.winehq.org/show_bug.cgi?id=26632
--- Comment #1 from Sagawa sagawa.aki+winebugs@gmail.com 2011-04-02 03:43:01 CDT --- Created an attachment (id=33900) --> (http://bugs.winehq.org/attachment.cgi?id=33900) run result (Windows XP, Windows 7)
http://bugs.winehq.org/show_bug.cgi?id=26632
--- Comment #2 from Sagawa sagawa.aki+winebugs@gmail.com 2011-04-02 03:45:40 CDT --- Created an attachment (id=33901) --> (http://bugs.winehq.org/attachment.cgi?id=33901) run result (Wine 1.3.17)
http://bugs.winehq.org/show_bug.cgi?id=26632
Sagawa sagawa.aki+winebugs@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #33899|application/octet-stream |text/plain mime type| |
http://bugs.winehq.org/show_bug.cgi?id=26632
--- Comment #3 from Nikolay Sivov bunglehead@gmail.com 2011-04-02 04:35:18 CDT --- Is it another difference in Microsoft interpretation of Unicode?
http://bugs.winehq.org/show_bug.cgi?id=26632
--- Comment #4 from Sagawa sagawa.aki+winebugs@gmail.com 2011-04-02 07:55:23 CDT --- (In reply to comment #3)
Is it another difference in Microsoft interpretation of Unicode?
Probably yes. Their implementation converts some undefined byte character to Unicode Private Use Areas (PUA). It is necessary to do round-trip conversion (e.g. ANSI:0xFF turns into Unicode:U+F8F3, then it should be ANSI:0xFF).
Although that, PUA is not a right place to map because Unicode standards does not define characters in that area. Thus MB_ERR_INVALID_CHARS flag doesn't allow to convert to PUA.
http://bugs.winehq.org/show_bug.cgi?id=26632
--- Comment #5 from Nikolay Sivov bunglehead@gmail.com 2011-04-02 08:47:17 CDT --- AFAIK wine's unicode tables are generated directly from unicode.org data, and I'm sure it already was a request to customize that (case Windows does some things other way). I don't remember why it was rejected - probably because MS tweaks its implementation from release to release and we want to rely on standard data.
Somebody experienced in that area could help here. Dmitry?
http://bugs.winehq.org/show_bug.cgi?id=26632
Dan Kegel dank@kegel.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |dank@kegel.com
--- Comment #6 from Dan Kegel dank@kegel.com 2011-04-02 10:14:04 CDT --- What is the impact on the user of this problem?
http://bugs.winehq.org/show_bug.cgi?id=26632
--- Comment #7 from Sagawa sagawa.aki+winebugs@gmail.com 2011-04-02 23:10:41 CDT --- (In reply to comment #5)
AFAIK wine's unicode tables are generated directly from unicode.org data, and I'm sure it already was a request to customize that (case Windows does some things other way).
The Unicode tables is not a problem. The table doesn't cover MB_ERR_INVALID_CHARS behavior. Without flags, the results (WCHARs) are same as Windows. In my opinion, wine raises an error for the conversion with MB_ERR_INVALID_CHARS to some PUA code points (as Windows does). Currently wine does raise a error when it meets a default Unicode character only.
(In reply to comment #6)
What is the impact on the user of this problem?
For instance, encoding detection. In Japanese CP932, 0xFE is an undefined character, not used in normalized sequence. But in EUC-JP (another encoding system), it is a valid code sequence. In Windows, MB_ERR_INVALID_CHARS flag helps this.
http://bugs.winehq.org/show_bug.cgi?id=26632
Dmitry Timoshkov dmitry@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever Confirmed|0 |1
--- Comment #8 from Dmitry Timoshkov dmitry@codeweavers.com 2011-04-03 00:15:05 CDT --- (In reply to comment #0)
For instance, Japanese Windows marks `X' (conversion fail) for 0xA0, 0xFD, 0xFE and 0xFF. But Wine (LANG=ja_JP.UTF-8) marks `o' (OK) for them.
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT specifically marks 0xA0, 0xFD, 0xFE and 0xFF as #UNDEFINED, our parser needs to mark those as invalid somehow.
http://bugs.winehq.org/show_bug.cgi?id=26632
--- Comment #9 from Sagawa sagawa.aki+winebugs@gmail.com 2011-04-03 00:49:12 CDT --- Created an attachment (id=33913) --> (http://bugs.winehq.org/attachment.cgi?id=33913) proposed patch
http://bugs.winehq.org/show_bug.cgi?id=26632
--- Comment #10 from Sagawa sagawa.aki+winebugs@gmail.com 2011-04-03 01:10:11 CDT --- (In reply to comment #8)
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT specifically marks 0xA0, 0xFD, 0xFE and 0xFF as #UNDEFINED, our parser needs to mark those as invalid somehow.
Yes that's right.
Please note 0x80 is also marked as #UNDEFINED in CP932.TXT. But MultiByteToWideChar with MB_ERR_INVALID_CHARS doesn't complain it.
The big difference between them is that 0xA0, 0xFD, 0xFE and 0xFF are mapped into Private Unicode Area in bestfit932.txt[1], 0x80 is not.
[1] ... ftp://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit932.txt
I wrote a proposed patch using this observation.
http://bugs.winehq.org/show_bug.cgi?id=26632
--- Comment #11 from Dmitry Timoshkov dmitry@codeweavers.com 2011-04-03 08:43:03 CDT --- (In reply to comment #9)
Created an attachment (id=33913)
--> (http://bugs.winehq.org/attachment.cgi?id=33913) [details]
proposed patch
This should be done in libs/wine/mbtowc.c,check_invalid_chars_dbcs() instead.
http://bugs.winehq.org/show_bug.cgi?id=26632
--- Comment #12 from Dmitry Timoshkov dmitry@codeweavers.com 2011-04-03 08:46:18 CDT --- (In reply to comment #11)
Created an attachment (id=33913)
--> (http://bugs.winehq.org/attachment.cgi?id=33913) [details] [details]
proposed patch
This should be done in libs/wine/mbtowc.c,check_invalid_chars_dbcs() instead.
Hmm, please ignore my comment, somehow I sent a comment to wrong patch, sorry.
http://bugs.winehq.org/show_bug.cgi?id=26632
--- Comment #13 from Sagawa sagawa.aki+winebugs@gmail.com 2011-04-03 09:35:28 CDT --- Created an attachment (id=33916) --> (http://bugs.winehq.org/attachment.cgi?id=33916) testcase for the patch
(In reply to comment #11)
This should be done in libs/wine/mbtowc.c,check_invalid_chars_dbcs() instead.
Thank you for your comments. Third hunk of the patch is for check_invalid_chars_dbcs().
Why did I patch into check_invalid_chars_sbcs()? Because some SBCS codepages, e.g. CP1255, have a same problem. Attached test case examines these behavoir.
http://bugs.winehq.org/show_bug.cgi?id=26632
--- Comment #14 from Dmitry Timoshkov dmitry@codeweavers.com 2011-04-03 22:59:54 CDT --- Please send to wine-patches (both the test and the fix).
http://bugs.winehq.org/show_bug.cgi?id=26632
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED
--- Comment #15 from Austin English austinenglish@gmail.com 2011-04-05 13:15:51 CDT --- Fixed by http://source.winehq.org/git/wine.git/commitdiff/16d57370090a63560e01e646995...
http://bugs.winehq.org/show_bug.cgi?id=26632
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #16 from Alexandre Julliard julliard@winehq.org 2011-04-15 12:49:56 CDT --- Closing bugs fixed in 1.3.18.