https://bugs.winehq.org/show_bug.cgi?id=39297
Bug ID: 39297 Summary: kernel32.IsValidCodePage and friends don't support code page 708. Product: Wine Version: unspecified Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: kernel32 Assignee: wine-bugs@winehq.org Reporter: htl10@users.sourceforge.net Distribution: ---
Microsoft's FontValidator (https://www.microsoft.com/typography/FontValidator.mspx) when processing certain fonts, try to access code page 708, which is not available under wine.
FYI, mono has support for code page 708 - c.f. mcs/class/I18N/Rare/CP708.cs .
https://bugs.winehq.org/show_bug.cgi?id=39297
Hin-Tak Leung htl10@users.sourceforge.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Version|unspecified |1.7.51
--- Comment #1 from Hin-Tak Leung htl10@users.sourceforge.net --- FYI, the list of supported codepage by wine seems to be just wine/libs/wine/c_*.c
https://bugs.winehq.org/show_bug.cgi?id=39297
Anastasius Focht focht@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |dotnet, download URL| |https://www.microsoft.com/t | |ypography/FontValidator.msp | |x CC| |focht@gmx.net Summary|kernel32.IsValidCodePage |Microsoft FontValidator |and friends don't support |(.NET 2.0 app) wants code |code page 708. |page 708 when processing | |certain fonts
https://bugs.winehq.org/show_bug.cgi?id=39297
Hin-Tak Leung htl10@users.sourceforge.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords|dotnet | Summary|Microsoft FontValidator |kernel32.IsValidCodePage |(.NET 2.0 app) wants code |and friends don't support |page 708 when processing |code page 708. |certain fonts |
--- Comment #2 from Hin-Tak Leung htl10@users.sourceforge.net --- dotnet is not relevant. I can supply native code test app if that helps. The mention of mono is just to indicate a source of information, as there seems to be no standard to this encoding.
FWIW, GNU libc also support this encoding:
$ iconv -l | grep -i 708 ASMO-708//
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #3 from Hin-Tak Leung htl10@users.sourceforge.net --- The application calls kernel32.IsValidCodePage() and friends via a mixed mode assembly, and that part is native code.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #4 from Hin-Tak Leung htl10@users.sourceforge.net --- The list of functions which FontValidator tries to access are:
BOOL kernel32.GetCPInfo( UINT CodePage, LPCPINFO lpCPInfo ) BOOL kernel32.IsValidCodePage( UINT CodePage ) BOOL kernel32.IsDBCSLeadByteEx( UINT CodePage, BYTE TestChar ) int kernel32.MultiByteToWideChar( UINT CodePage, DWORD dwFlags, LPCSTR lpMultiByteStr, int cbMultiByte, LPWSTR lpWideCharStr, int cchWideChar )
As one can see, all of them contains a codepage argument.
For most (all?) codepages, these eventually traces to one of wine/libs/wine/c_*.c .
for example, for code page 950 (traditional chinese), there is a file wine/libs/wine/c_950.c .
Whereas c_950.c and friends are derived from public sources on www.unicode.org, I cannot (yet) find an authoritative source of what code page 708 is, although both glibc and mono claims to supports this code page in their internationalization support.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #5 from Nikolay Sivov bunglehead@gmail.com --- The question is does this tool work on Windows for same fonts? (please name them if it's possible, I can test it if you don't have access to Windows machine).
Another easy thing to try is to add test with this cp number for GetCPInfoEx for example, to see if it works at all and if it does see what's returned as a name.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #6 from Hin-Tak Leung htl10@users.sourceforge.net --- (In reply to Nikolay Sivov from comment #5)
The question is does this tool work on Windows for same fonts? (please name them if it's possible, I can test it if you don't have access to Windows machine).
Of course it does - it is a microsoft tool!
e.g. on Windows 8.1 (which comes with this code page) - if you use the tool to analyse window 8.1's tahoma.ttf, in the generated report it would say something like:
A CodePage bit is set in ulCodePageRange, but the font is missing some of the printable characters from that codepage, bit #61, Arabic; ASMO 708 (49 missing, first ten missing chars are: U2502 U2524 U2561 U2562 U2556 U2555 U2563 U2551 U2557 U255D)
but running under wine + dotnet 2, it would say something about code page not installed.
Note that:
1. for some unknown reason, it does not want to navigate to c:\windows\fonts on windows 8.1 - but you can copy tahoma.ttf onto desktop to test.
2. some part of it does not work with wine + wine-mono, though this part does. You should de-select all of the table tests except OS/2, and also choose to "save report file" to a location of your choice, rather than the default "open after analyis" (from temp location), if you are testing under wine + wine-mono or wine + dotnet .
Another easy thing to try is to add test with this cp number for GetCPInfoEx for example, to see if it works at all and if it does see what's returned as a name.
What is the purpose of this test? I already told you what the problem is, and where the bulk of new code need to be added - there needs to be a new "wine/libs/wine/c_708.c" file, and an extra line "wine/libs/wine/cptable.c" to register the new code page table, just like any of the others!
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #7 from Hin-Tak Leung htl10@users.sourceforge.net --- According to https://www.microsoft.com/typography/otspec/os2.htm#cpr
code page 708 is "Arabic; ASMO 708".
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #8 from Hin-Tak Leung htl10@users.sourceforge.net --- note that code page 708 is not the same as code page 1256, Arabic, which wine already supports.
https://bugs.winehq.org/show_bug.cgi?id=39297
Dmitry Timoshkov dmitry@baikal.ru changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever confirmed|0 |1
--- Comment #9 from Dmitry Timoshkov dmitry@baikal.ru --- http://www.unicode.org/Public/MAPPINGS doesn't have a ready to use mapping for code page 708, however according to http://coq.no/character-tables/dos/en: "This encoding is an almost compatible superset of ISO 8859/6 (all Arabic letters in the same positions, only one incompatible assignment, adding line-drawing characters and lowercase French accented lowercase vowels)". So, codepage 28596 (ISO 8859-6 Arabic) probably could be used and an alias/ replacement.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #10 from Hin-Tak Leung htl10@users.sourceforge.net --- (In reply to Dmitry Timoshkov from comment #9)
http://www.unicode.org/Public/MAPPINGS doesn't have a ready to use mapping for code page 708, however according to http://coq.no/character-tables/dos/en: "This encoding is an almost compatible superset of ISO 8859/6 (all Arabic letters in the same positions, only one incompatible assignment, adding line-drawing characters and lowercase French accented lowercase vowels)". So, codepage 28596 (ISO 89-6 Arabic) probably could be used and an alias/ replacement.
"almost compatible superset" is a gross understatement and misdirection. I found 28 code point differences and 45 additions out of 256. Depends on how you count it, 28 (10%) or 73 (30%) incompatible is not "almost" compatible. "Almost compatible" is a joke.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #11 from Dmitry Timoshkov dmitry@baikal.ru --- (In reply to Hin-Tak Leung from comment #10)
http://www.unicode.org/Public/MAPPINGS doesn't have a ready to use mapping for code page 708, however according to http://coq.no/character-tables/dos/en: "This encoding is an almost compatible superset of ISO 8859/6 (all Arabic letters in the same positions, only one incompatible assignment, adding line-drawing characters and lowercase French accented lowercase vowels)". So, codepage 28596 (ISO 89-6 Arabic) probably could be used and an alias/ replacement.
"almost compatible superset" is a gross understatement and misdirection. I found 28 code point differences and 45 additions out of 256. Depends on how you count it, 28 (10%) or 73 (30%) incompatible is not "almost" compatible. "Almost compatible" is a joke.
It depends on what you think is most useful part of the table. Since apparently that's Arabic alphabet then it's a good solution, if you need something else then it's worth probably at least mention that.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #12 from Hin-Tak Leung htl10@users.sourceforge.net --- (In reply to Dmitry Timoshkov from comment #11)
It depends on what you think is most useful part of the table. Since apparently that's Arabic alphabet then it's a good solution, if you need something else then it's worth probably at least mention that.
The Fontvalidator already tests for arabic code page 1256, which I believe is iso 8859-6, in bit 6. Bit 61 is something else. This is a tool for testing compliance to an iso specification 14496-22 (the open type format), so nothing less than exact match is good enough.
'useful' and '30% incompatible' isn't good enough, when testing for compliance to an iso standard. '30% incompatible' is not compatible, as far as standard compliance is concerned.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #13 from Dmitry Timoshkov dmitry@baikal.ru --- (In reply to Hin-Tak Leung from comment #12)
It depends on what you think is most useful part of the table. Since apparently that's Arabic alphabet then it's a good solution, if you need something else then it's worth probably at least mention that.
The Fontvalidator already tests for arabic code page 1256, which I believe is iso 8859-6, in bit 6. Bit 61 is something else. This is a tool for testing compliance to an iso specification 14496-22 (the open type format), so nothing less than exact match is good enough.
'useful' and '30% incompatible' isn't good enough, when testing for compliance to an iso standard. '30% incompatible' is not compatible, as far as standard compliance is concerned.
If you could provide a reference to the code page table compatible with the format of unicode.org tables or the one that could be easily adapted that would be great.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #14 from Nikolay Sivov bunglehead@gmail.com --- That one maybe https://msdn.microsoft.com/en-us/library/cc195061.aspx ? Chars below 0x20 are mapped directly presumably.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #15 from Dmitry Timoshkov dmitry@baikal.ru --- (In reply to Nikolay Sivov from comment #14)
That one maybe https://msdn.microsoft.com/en-us/library/cc195061.aspx ? Chars below 0x20 are mapped directly presumably.
That's not really a table but rather a chart.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #16 from Hin-Tak Leung htl10@users.sourceforge.net --- Created attachment 52399 --> https://bugs.winehq.org/attachment.cgi?id=52399 encoding code point to unicode mapping table of code page 708
This is to be best of my knowledge the mapping table of code page 708. It is derived from hexdump'ing c_708.nls from the typical windows installation and some on-line description about what the format of the nls file type might be.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #17 from Hin-Tak Leung htl10@users.sourceforge.net --- BTW, the nls file format seems to be undocumented, and wine therefore cannot/does not make use of them, but that's the subject matter of bug 39298 - "kernel32 does not support custom nls installation".
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #18 from Dmitry Timoshkov dmitry@baikal.ru --- (In reply to Hin-Tak Leung from comment #16)
Created attachment 52399 [details] encoding code point to unicode mapping table of code page 708
This is to be best of my knowledge the mapping table of code page 708. It is derived from hexdump'ing c_708.nls from the typical windows installation and some on-line description about what the format of the nls file type might be.
Unfortunately I don't think that Wine can use this (and attaching such dump may violate a copyright), there is a reason why Wine uses unicode.org tables.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #19 from Hin-Tak Leung htl10@users.sourceforge.net --- (In reply to Nikolay Sivov from comment #14)
That one maybe https://msdn.microsoft.com/en-us/library/cc195061.aspx ? Chars below 0x20 are mapped directly presumably.
That chart seems to be wrong or outdated. In modern windows, code point 243 - 248 are definitely used, AFAIK. See the attached table to this bug.
On real windows, you can probably write a little program to dump the mapping table for code page 708, by doing something like this:
------------------------------------------ char inputbuf[1]; wchar_t outputbuf[1]; UINT CodePage = 708 ;
for (ushort c = 0; c<256; c++) { inputbuf[0] = (char)c; if (MultiByteToWideChar(CodePage, MB_ERR_INVALID_CHARS, inputbuf, 1, outputbuf, 1) != 0) { // dump c and outputbuf[0] in two columns }
} -----------------------------------------
and hopefully it should result in something identical to what I attached.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #20 from Hin-Tak Leung htl10@users.sourceforge.net --- (In reply to Dmitry Timoshkov from comment #18) ...
Unfortunately I don't think that Wine can use this (and attaching such dump may violate a copyright), there is a reason why Wine uses unicode.org tables.
That's the "authoritative" source I derived the table from; as I wrote in previous comments, both mono and glibc claim to support code page 708 so they must somehow got a listing of the mapping table, or equivalent, from somewhere.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #21 from Hin-Tak Leung htl10@users.sourceforge.net --- On glibc system (i.e linux), yu can get at more or less the same mapping table by doing something like this:
perl -e 'for ($i = 0; $i<256; $i++) {printf "%c\n", ($i);}' \ | iconv -t UTF16 -f ASMO-708 -c | hexdump -C
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #22 from Austin English austinenglish@gmail.com --- The content of attachment 52399 has been deleted for the following reason:
Reverse engineered from Windows dll
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #23 from Hin-Tak Leung htl10@users.sourceforge.net --- (In reply to Austin English from comment #22)
The content of attachment 52399 [details] has been deleted for the following reason:
Reverse engineered from Windows dll
FWIW, nls files are not dll's - it does not seem to be officially documented, but there is an on-line description about its format, and it is basically a header plus two arrays, one for raw encoding and another for console output (they differ by just the unprintable characters below 0x20, I think ), and the reverse encoding table from unicode back to code points.
So for single-byte encodings (i.e. non-CJK), somewhat after the beginning of the file is simply an array of 512 bytes, telling you how 0-255 are mapped to unicode (UTF16). It is hardly "reverse-engineering" if you simply read 2 x 256 bytes and write it out.
The array is quite easy to spot, because for single-byte encodings, ascii's are mapped to asciis, so the alignment is just that with the higher byte of an UTF16 padded with nulls.
wine's source never documented how wine/loader/l_intl.nls was made, but if the logic of making it (i.e. writing it) is implemented and extended into reading such files also, you can close this bug as duplicate of -
bug 39298 - "kernel32 does not support custom nls installation".
since wine being capable of reading nls files, would also mean that one can install code page 708 as an add-on.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #24 from Hin-Tak Leung htl10@users.sourceforge.net --- (In reply to Hin-Tak Leung from comment #19)
Alternatively, if you fill out the program below, and make running it under wine matches the result of running it under windows, would that be considered reverse-engineering? All it does is to make one array inside wine matches, so you can probably do it "clean-room" style, building the array, one element at a time.
On real windows, you can probably write a little program to dump the mapping table for code page 708, by doing something like this:
char inputbuf[1]; wchar_t outputbuf[1]; UINT CodePage = 708 ;
for (ushort c = 0; c<256; c++) { inputbuf[0] = (char)c; if (MultiByteToWideChar(CodePage, MB_ERR_INVALID_CHARS, inputbuf, 1, outputbuf, 1) != 0) { // dump c and outputbuf[0] in two columns }
}
and hopefully it should result in something identical to what I attached.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #25 from Austin English austinenglish@gmail.com --- (In reply to Hin-Tak Leung from comment #24)
(In reply to Hin-Tak Leung from comment #19)
Alternatively, if you fill out the program below, and make running it under wine matches the result of running it under windows, would that be considered reverse-engineering?
Questions like this belong on wine-devel, not everyone reads wine-bugs.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #26 from Hin-Tak Leung htl10@users.sourceforge.net --- Here is the full working version of the test code:
----------------- #include <wtypes.h> #include <stdio.h>
int main(void) { char inputbuf[1]; wchar_t outputbuf[1]; UINT CodePage = 708;
for (unsigned short c = 0; c<256; c++) { inputbuf[0] = (char)c; if (MultiByteToWideChar(CodePage, MB_ERR_INVALID_CHARS, inputbuf, 1, outputbuf, 1) != 0) { printf("0x%02X 0x%04X\n", c, outputbuf[0]); } } return 0; } -----------------
You can cross-compile it with:
i686-w64-mingw32-gcc -Wall -o test.exe cp708test.c
then run with
wine test.exe
It just writes the mapping table out as a two-column table.
Change 708 to 1252 to test for code page 1252, etc if you wish.
https://bugs.winehq.org/show_bug.cgi?id=39297
--- Comment #27 from Nikolay Sivov bunglehead@gmail.com --- This was added with https://source.winehq.org/git/wine.git/?a=commit;h=1ca4536f7edcc884711a7dab3.... One thing still missing is textual name that GetCPInfoEx() returns, I'll send another patch for that.
Please retest.
https://bugs.winehq.org/show_bug.cgi?id=39297
Nikolay Sivov bunglehead@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Fixed by SHA1| |1ca4536f7edcc884711a7dab398 | |70df5d20c9785 Resolution|--- |FIXED
--- Comment #28 from Nikolay Sivov bunglehead@gmail.com --- Missing name was added with https://source.winehq.org/git/wine.git/commit/af17fcbc1c4ccb354208d1d45dbb03.... Marking fixed.
https://bugs.winehq.org/show_bug.cgi?id=39297
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #29 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 5.20.
https://bugs.winehq.org/show_bug.cgi?id=39297
Anastasius Focht focht@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- URL|https://www.microsoft.com/t |https://web.archive.org/web |ypography/FontValidator.msp |/20180223132312/http://down |x |load.microsoft.com/download | |/F/E/9/FE9795A3-756E-4F60-8 | |989-03DC9870F189/fontvalset | |up.msi