https://bugs.winehq.org/show_bug.cgi?id=47893
Bug ID: 47893 Summary: Unicode characters aren't being rendered correctly Product: WineHQ.org Version: unspecified Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: www-unknown Assignee: wine-bugs@winehq.org Reporter: huupoke12@gmail.com Distribution: ---
This is probably a regression since I didn't run into this issue before. Unicode characters aren't being rendered correctly. Examples:
https://appdb.winehq.org/commentview.php?iAppId=9581&iVersionId=28025&am... https://appdb.winehq.org/objectManager.php?sClass=application&iId=2586
https://bugs.winehq.org/show_bug.cgi?id=47893
Fabian Maurer dark.shadow4@web.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |dark.shadow4@web.de
https://bugs.winehq.org/show_bug.cgi?id=47893
Ken Sharp imwellcushtymelike@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Status|UNCONFIRMED |NEW
--- Comment #1 from Ken Sharp imwellcushtymelike@gmail.com --- I don't know if it's Unicode only but there's random characters appearing all over the site.
https://bugs.winehq.org/show_bug.cgi?id=47893
Ken Sharp imwellcushtymelike@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Product|WineHQ.org |WineHQ Apps Database Component|www-unknown |appdb-unknown
https://bugs.winehq.org/show_bug.cgi?id=47893
--- Comment #2 from Ken Sharp imwellcushtymelike@gmail.com --- Created attachment 65701 --> https://bugs.winehq.org/attachment.cgi?id=65701 Screenshot
*shudders*
There must be some (hidden?) characters upsetting some of the fields.
https://bugs.winehq.org/show_bug.cgi?id=47893
--- Comment #3 from huupoke12@gmail.com --- Yes. But I think it is related to the change of the web server. The first link is what I have commented and at that time, it is still being rendered correctly: 日本語 (escaped sequence if that is unreadable: \u65e5\u672c\u8a9e). I think that the server saved the comment as UTF-8 w/o BOM, but the server doesn't read it as UTF-8 and think it as ASCII.
https://bugs.winehq.org/show_bug.cgi?id=47893
--- Comment #4 from huupoke12@gmail.com --- It seems like the server saves the file as UTF-8, but reads the file as ISO 8859-1 encoded and non-printable characters are replaced with something else which something else printable or "?".
For example with the first one: Original comment is "日本語". Garbled is "日本語". When saving "日本語" in a text file with UTF-8, its hex value is "e697 a5e6 9cac e8aa 9e0a". But if reading the file as ISO 8859-1, "e6" is "æ". "97" is "END OF GUARDED AREA" (unprintable) (https://www.fileformat.info/info/unicode/char/0097/index.htm) and being replace with "—" (printable). "a5" is "¥"...
By the way, I have tested with changing my name on the AppDB. It's showing fine on the changing name page (Preferences) even with Japanese. But when see it on the comment, the characters are replaced with "?". And after more testing, it seems that only the Basic Latin range (0x0000 => 0x007F) AND the Latin-1 Supplement range (0x0080 => 0x00FF) is being displayed correctly, the other ranges are replaced with "?". This proves the above, since the ASCII only uses 7-bit and can only present the Basic Latin range (128 characters), but the ISO8859-1 uses 8-bit (UTF-8 uses 8-bit too) so it can present both the Basic Latin range AND the Latin-1 Supplement range (256 characters). Both the UTF-8 and ISO 8859 is backward compatible with ASCII, but they are not compatible with each other. In UTF-8, the first significant bit is used to mark if it is not a ASCII character and tell the decoder to continue reading the other bytes to decode it to a single character. But in ISO 8859, it used that byte to make room for more character. Thus, every byte is a valid character when reading as ISO 8859 (0x00 => 0xFF), while UTF-8 is not.
https://bugs.winehq.org/show_bug.cgi?id=47893
Šaňo Beller Aero@Aether.sk changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |Aero@Aether.sk
--- Comment #5 from Šaňo Beller Aero@Aether.sk --- Created attachment 66975 --> https://bugs.winehq.org/attachment.cgi?id=66975 Showing invalid unicode in maintainer name
I've found this issue as well when looking at maintainer names and such. Attached an image showing this. Link can be seen here:
https://appdb.winehq.org/objectManager.php?sClass=version&iId=37229
It should show as "Šaňo"
https://bugs.winehq.org/show_bug.cgi?id=47893
--- Comment #6 from Nguyễn Chính Hữu huupoke12@gmail.com --- Yes, like what I have said, it only show properly the Basic Latin block (https://en.wikipedia.org/wiki/Basic_Latin_(Unicode_block)) and Latin-1 Supplement block (https://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29). It seems to be the behavior of ISO/IEC 8859-1 (https://en.wikipedia.org/wiki/ISO_8859-1).
If the character is outside of those blocks, it can't be shown and only be presented with a "?". You can try changing the "Real Name" in your preferences to see it.
https://bugs.winehq.org/show_bug.cgi?id=47893
Joerg Schiermeier mywine@schiermeier-it.de changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |mywine@schiermeier-it.de
--- Comment #7 from Joerg Schiermeier mywine@schiermeier-it.de --- This bug still exists. I can see it in the German Umlauts like öäüßÖÄÜ etc.
Mostly this appears in the "Developer" section. As an example see this: https://appdb.winehq.org/objectManager.php?sClass=application&iId=6582
The name of the developer contains an "ö". If it will not be possible to solve this problem the "ö" could be replaced by an "oe", "ä" by an "ae" and "ü" by an "ue". Capital letters like "Ü" are replaced by "Ue" ...!
https://bugs.winehq.org/show_bug.cgi?id=47893
--- Comment #8 from Rosanne DiMesio dimesio@earthlink.net --- Someone recently posted about this on the forum, specifically about Swedish characters. https://forum.winehq.org/viewtopic.php?t=38851
The confusing thing is that there are many AppDB entries where these characters are displayed properly--in the entry from comment #7, the "ö" in the developer's name was displayed correctly on one part of the page, incorrectly in another section of the same page. In all cases I've looked at, I was able to fix the display simply by editing the entry to retype the character, and saving it.
I have also noticed entries that have the same problem with curly quotes and em- and en-dashes. I've been wondering whether the problem stems from someone copy-pasting text from a word processing program rather than typing it in directly.
https://bugs.winehq.org/show_bug.cgi?id=47893
--- Comment #9 from Ken Sharp imwellcushtymelike@gmail.com --- All the characters used to display correctly, then somebody changed something a while back and they all became corrupted. NEW entries are displayed correctly (since that change) but the old ones are unlikely to ever be fixed, unless the effect of that unknown change can be reversed.
https://bugs.winehq.org/show_bug.cgi?id=47893
--- Comment #10 from Rosanne DiMesio dimesio@earthlink.net --- The change happened back in 2008, bug #16514. I fixed hundreds of entries manually back then. But there are much more recent entries with the problem, so something else is going on.
In any case, I fix the entries when I run across them, and if someone reports the problem on the forum, with links to the affected entries, I'll fix those too.