"Erich E. Hoover" ehoover@mymail.mines.edu writes:
Real Name: Erich Hoover
Description: This patch adds the ability in HTML Help to convert HTML encoded characters (e.g. ê) into the Unicode character equivalent. This feature is needed by the table of contents and the index for displaying international characters in some CHM files. As of version 3 of the patch the decoding is done manually by parsing the HTML characters instead of using the web browser control, the search is a now a binary search, and it also includes some additional cleanup. It is important to note that HTML Help only supports characters within the ANSI code pages, so support for multi-byte characters is not necessary.
I don't see why you'd want to use ANSI code pages for this, since they will be converted to Unicode anyway.
On Thu, Jun 14, 2012 at 11:22 AM, Alexandre Julliard julliard@winehq.org wrote:
...
I don't see why you'd want to use ANSI code pages for this, since they will be converted to Unicode anyway.
It's important to use the ANSI code page because of the way the characters are handled internally (you can see this in part 4). If you just convert the characters straight to unicode then ê will become "ê" instead of "ę" in the Polish (?) code page (similar behavior in other code pages). You can see an example in the screenshot from Bug #27767 : http://bugs.winehq.org/attachment.cgi?id=35529 . If you converted straight to Unicode then you'd need to convert it a second time to get the proper character, unless I'm missing something.
Erich
"Erich E. Hoover" ehoover@mymail.mines.edu writes:
On Thu, Jun 14, 2012 at 11:22 AM, Alexandre Julliard julliard@winehq.org wrote:
...
I don't see why you'd want to use ANSI code pages for this, since they will be converted to Unicode anyway.
It's important to use the ANSI code page because of the way the characters are handled internally (you can see this in part 4). If you just convert the characters straight to unicode then ê will become "ê" instead of "ę" in the Polish (?) code page (similar behavior in other code pages). You can see an example in the screenshot from Bug #27767 : http://bugs.winehq.org/attachment.cgi?id=35529 . If you converted straight to Unicode then you'd need to convert it a second time to get the proper character, unless I'm missing something.
Wow, is Windows really doing it this way? That's a good contestant for the Most Stupid Win32 API Behavior award.
On Thu, Jun 14, 2012 at 11:44 AM, Alexandre Julliard julliard@winehq.org wrote:
... Wow, is Windows really doing it this way? That's a good contestant for the Most Stupid Win32 API Behavior award.
I can't think of any other way they'd be doing it... It honestly blew me away to see that screenshot, I wouldn't have thought to implement it that way in a million years.
Erich
On Thu, Jun 14, 2012 at 11:47 AM, Erich E. Hoover ehoover@mines.edu wrote:
... I can't think of any other way they'd be doing it... It honestly blew me away to see that screenshot, I wouldn't have thought to implement it that way in a million years.
The bug submitter was using xchm, so for completeness I ran the example CHM file on XP and I see the exact same behavior (http://bugs.winehq.org/attachment.cgi?id=40547). If you can think of another way you'd like me to approach this bug then let me know.
Erich