http://bugs.winehq.org/show_bug.cgi?id=18191
Summary: Notepad always saves as ISO-8859, instead of UTF-8 Product: Wine Version: 1.1.19 Platform: PC OS/Version: Linux Status: NEW Keywords: download Severity: minor Priority: P2 Component: programs AssignedTo: wine-bugs@winehq.org ReportedBy: austinenglish@gmail.com
There's actually a few bugs here, but I'll start with the first one I noticed: $ wine notepad Type/paste "Stefan Dösinger" # First UTF-8 I found in git log :-) File, Save as foobar.txt $ cat foobar.txt
Should get Stefan Dösinger back, but instead you get something like "Stefan Dösinger".
With windows notepad, however, you'll get "Stefan Dösinger".
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #1 from Dmitry Timoshkov dmitry@codeweavers.com 2009-04-25 05:55:36 --- Wine notepad always saves in current Windows encoding (just like Win9x and NT 40 does), so it's normal that it doesn't match your Linux locale. Windows notepad at least in XP is able to save in UTF-8, but only if asked for.
So, you probably need to change the subject to "Notepad should support saving in UTF-8", and change severity to enhancement.
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #2 from Dmitry Timoshkov dmitry@codeweavers.com 2009-04-25 05:57:33 --- Austin, I see that you have opened bug 18192 for additional encodings support, so this bug is invalid then.
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #3 from Austin English austinenglish@gmail.com 2009-04-25 13:15:08 --- (In reply to comment #1)
Wine notepad always saves in current Windows encoding (just like Win9x and NT 40 does), so it's normal that it doesn't match your Linux locale. Windows notepad at least in XP is able to save in UTF-8, but only if asked for.
It uses the IsTextUnicode function to determine this (albeit sometimes incorrectly) automatically. See: http://msdn.microsoft.com/en-us/library/dd318672(VS.85).aspx http://en.wikipedia.org/wiki/Notepad http://en.wikipedia.org/wiki/Bush_Hid_The_Facts
This bug is separate from bug 18192, which is about having those dialog options when saving the file. This bug is for supporting IsTextUnicode and saving the file appropriately.
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #4 from Dmitry Timoshkov dmitry@codeweavers.com 2009-04-25 20:26:03 --- Well, the subject of this bug describes current Wine behaviour and the desired one (how Windows notepad behaves) incorrectly, i.e. it's simply wrong.
This bug is essentially invalid due to that.
http://bugs.winehq.org/show_bug.cgi?id=18191
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Notepad always saves as ISO-|Notepad should detect text |8859, instead of UTF-8 |type and adjust encoding | |automatically
--- Comment #5 from Austin English austinenglish@gmail.com 2009-04-25 23:46:47 --- (In reply to comment #4)
Well, the subject of this bug describes current Wine behaviour and the desired one (how Windows notepad behaves) incorrectly, i.e. it's simply wrong.
We have plenty of cases in wine where Windows behaves incorrectly, yet we replicate it.
This bug is essentially invalid due to that.
Adjusting summary appropriately.
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #6 from Dmitry Timoshkov dmitry@codeweavers.com 2009-04-26 01:04:07 --- The subject is still wrong, Windows notepad doesn't detect the encoding, it always uses unicode internally (notepad in Wine does that as well), and just is able to save as unicode (which Wine notepad can't do).
http://bugs.winehq.org/show_bug.cgi?id=18191
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Notepad should detect text |Notepad should detect text |type and adjust encoding |type and prompt to change |automatically |saved file encoding | |automatically
--- Comment #7 from Austin English austinenglish@gmail.com 2009-04-26 02:06:23 --- (In reply to comment #6)
The subject is still wrong, Windows notepad doesn't detect the encoding, it always uses unicode internally (notepad in Wine does that as well), and just is able to save as unicode (which Wine notepad can't do).
In my past testing, it always saves as ISO-8859 (English locale), but if unicode only characters are used, I'm warned that data loss may occur and that I should save as UTF-8.
I don't have Windows at home, I'll test this Monday at work to get a reproducible testcase.
http://bugs.winehq.org/show_bug.cgi?id=18191
Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexander.scott.johns+winebu | |g@googlemail.com
--- Comment #8 from Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com 2009-04-26 07:47:58 --- (In reply to comment #7)
In my past testing, it always saves as ISO-8859 (English locale), but if unicode only characters are used, I'm warned that data loss may occur and that I should save as UTF-8.
Windows XP Notepad remembers the encoding of the file when it is opened, and uses the same encoding when saving (although UTF16-BE will become UTF16-LE). If the file is new, the default encoding is the active ANSI codepage. When saving, if there are characters in the file which cannot be represented in the target codepage, Notepad will put up a warning:
== Notepad ==
<filename> This file contains characters in Unicode format which will be lost if you save this file as an ANSI encoded text file. To keep the Unicode information, click Cancel below and then select one of the Unicode options from the Encoding drop down list. Continue? [ OK ] [ Cancel ]
IsTextUnicode can only detect UTF16-LE and UTF16-BE, and not UTF8, I believe.
Also, WinXP Notepad inserts a byte-order mark when saving as UTF8 (even when there wasn't one before). This is the three bytes { 0xEF, 0xBB, 0xBF }.
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #9 from Dmitry Timoshkov dmitry@codeweavers.com 2009-04-26 09:03:36 --- (In reply to comment #7)
In my past testing, it always saves as ISO-8859 (English locale), but if unicode only characters are used, I'm warned that data loss may occur and that I should save as UTF-8.
Most likely what notepad in Windows does is a check whether conversion from unicode to current code page leads to loss then it warns. That's not a "detection", just the WideCharToMultiByte feature.
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #10 from Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com 2009-07-02 09:44:44 --- Patches which implement opening and saving in Unicode were sent and have been committed:
http://source.winehq.org/git/wine.git/?a=commit;h=42729bc1c1cc513a82cae4f793... http://source.winehq.org/git/wine.git/?a=commit;h=8b6b7b2c39d77f7cd29657ecc3... http://source.winehq.org/git/wine.git/?a=commit;h=080cc909929dc4eb64711120b6... http://source.winehq.org/git/wine.git/?a=commit;h=67766392bf735c6c9007213bd1... http://source.winehq.org/git/wine.git/?a=commit;h=84fd1c84f8f2393290438f452a...
Files saved in UTF-8 by Wine Notepad have a byte-order mark. Unix apps may not expect this.
Files saved in UTF-8 by Unix apps will not have a byte-order mark. To open these in Wine Notepad, you need to change the encoding.
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #11 from Austin English austinenglish@gmail.com 2009-07-02 09:49:26 --- (In reply to comment #10)
Patches which implement opening and saving in Unicode were sent and have been committed:
http://source.winehq.org/git/wine.git/?a=commit;h=42729bc1c1cc513a82cae4f793... http://source.winehq.org/git/wine.git/?a=commit;h=8b6b7b2c39d77f7cd29657ecc3... http://source.winehq.org/git/wine.git/?a=commit;h=080cc909929dc4eb64711120b6... http://source.winehq.org/git/wine.git/?a=commit;h=67766392bf735c6c9007213bd1... http://source.winehq.org/git/wine.git/?a=commit;h=84fd1c84f8f2393290438f452a...
Files saved in UTF-8 by Wine Notepad have a byte-order mark. Unix apps may not expect this.
Files saved in UTF-8 by Unix apps will not have a byte-order mark. To open these in Wine Notepad, you need to change the encoding.
Great work! I'll whip up some test files and compare them on wine/windows notepad.
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #12 from Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com 2009-07-02 09:52:06 --- (In reply to comment #11)
(In reply to comment #10)
Patches which implement opening and saving in Unicode were sent and have
[snip]
Files saved in UTF-8 by Wine Notepad have a byte-order mark. Unix apps may not expect this.
Files saved in UTF-8 by Unix apps will not have a byte-order mark. To open these in Wine Notepad, you need to change the encoding.
Great work! I'll whip up some test files and compare them on wine/windows notepad.
I already have some test files. I'll attach them here.
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #13 from Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com 2009-07-02 10:02:26 --- Created an attachment (id=22143) --> (http://bugs.winehq.org/attachment.cgi?id=22143) Patch which adds some test files to programs/notepad directory
There are test files in Latin1, UTF16-LE, UTF16-BE and UTF-8. For the last three encodings, there are files both with and without a byte-order mark.
Note: some chunks are Git binary patches and others contain unusual characters.
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #14 from Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com 2009-07-02 10:07:51 --- Created an attachment (id=22144) --> (http://bugs.winehq.org/attachment.cgi?id=22144) Patch which adds some test files to programs/notepad directory (note: application/octet-stream is deliberate)
Test files in Latin1, UTF16-LE, UTF16-BE and UTF-8 are provided. For the last three encodings, there are files both with and without a byte-order mark.
Note: some chunks are Git binary patches and others contain unusual characters, so I'm deliberately setting the content type to application/octet-stream so it isn't corrupted.
http://bugs.winehq.org/show_bug.cgi?id=18191
Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #22143|0 |1 is obsolete| | Attachment #22144|0 |1 is obsolete| |
--- Comment #15 from Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com 2009-07-02 10:52:02 --- Created an attachment (id=22145) --> (http://bugs.winehq.org/attachment.cgi?id=22145) Tarballed test files
Test files in Latin1, UTF16-LE, UTF16-BE and UTF-8 are provided. For the last three encodings, there are files both with and without a byte-order mark.
This time provided in an archive.
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #16 from Austin English austinenglish@gmail.com 2009-07-02 12:29:27 --- (In reply to comment #15)
Created an attachment (id=22145)
--> (http://bugs.winehq.org/attachment.cgi?id=22145) [details]
Tarballed test files
Test files in Latin1, UTF16-LE, UTF16-BE and UTF-8 are provided. For the last three encodings, there are files both with and without a byte-order mark.
This time provided in an archive.
Thanks! Saves me some effort. I'll test against windows notepad and make sure we're consistent. Next up, to automate some tests...
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #17 from Austin English austinenglish@gmail.com 2009-07-03 12:10:56 --- (In reply to comment #16)
(In reply to comment #15)
Created an attachment (id=22145)
--> (http://bugs.winehq.org/attachment.cgi?id=22145) [details] [details]
Tarballed test files
Test files in Latin1, UTF16-LE, UTF16-BE and UTF-8 are provided. For the last three encodings, there are files both with and without a byte-order mark.
This time provided in an archive.
Thanks! Saves me some effort. I'll test against windows notepad and make sure we're consistent. Next up, to automate some tests...
Seems wine botches all files without a byte-order mark. See screenshots.
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #18 from Austin English austinenglish@gmail.com 2009-07-03 12:13:28 --- Created an attachment (id=22152) --> (http://bugs.winehq.org/attachment.cgi?id=22152) notepad, xp vs wine
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #19 from Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com 2009-07-03 12:16:08 --- (In reply to comment #17) [snip]
Test files in Latin1, UTF16-LE, UTF16-BE and UTF-8 are provided. For the
[snip]
Thanks! Saves me some effort. I'll test against windows notepad and make sure we're consistent. Next up, to automate some tests...
Seems wine botches all files without a byte-order mark. See screenshots.
This is because in native Notepad \0 characters are converted into spaces; in Wine, the file is truncated at the first \0.
I have two patches that fixes this...
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #20 from Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com 2009-07-03 13:30:30 --- Created an attachment (id=22154) --> (http://bugs.winehq.org/attachment.cgi?id=22154) Tarball containing 2 patches that implement converting nul characters to spaces
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #21 from Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com 2009-07-04 20:34:28 --- Patches sent (slightly changed from those in tarball):
http://www.winehq.org/pipermail/wine-patches/2009-July/075387.html http://www.winehq.org/pipermail/wine-patches/2009-July/075388.html
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #22 from Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com 2009-07-06 13:15:17 --- Patches committed:
http://source.winehq.org/git/wine.git/?a=commit;h=93b99e0af36fd6290822a13d2e... http://source.winehq.org/git/wine.git/?a=commit;h=32c066f12abbf3864ec75b5cb6...
http://bugs.winehq.org/show_bug.cgi?id=18191
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #22152|0 |1 is obsolete| | Attachment #22154|0 |1 is obsolete| |
--- Comment #23 from Austin English austinenglish@gmail.com 2009-07-06 18:08:33 --- Created an attachment (id=22231) --> (http://bugs.winehq.org/attachment.cgi?id=22231) screenshot
Almost there. test-utf8-nobom.txt still has a problem. See screenshot.
Thanks for all your work!
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #24 from Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com 2009-07-06 19:00:33 --- (In reply to comment #23)
Created an attachment (id=22231)
--> (http://bugs.winehq.org/attachment.cgi?id=22231) [details]
screenshot
Almost there. test-utf8-nobom.txt still has a problem. See screenshot.
Thanks for all your work!
This screenshot shows that Wine Notepad has opened the <utf8 -bom> file in Latin1, and WinXP has opened it in UTF-8. This is expected behaviour for Wine Notepad if there is no byte-order mark (0xef, 0xbb, 0xbf). I think WinXP Notepad should do the same as Wine, unless it has a heuristic for detecting UTF-8. When opening the <utf8 -bom> file, what encoding does WinXP Notepad select in the Open dialog? Did you override it?
Maybe the byte-order mark was added back? If the file was saved within Notepad (either WinXP or Wine), then this would happen automatically. Try forcing Latin1 in WinXP and see whether the file starts with the characters  (i with two dots, right guillemet, inverted ?).
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #25 from Austin English austinenglish@gmail.com 2009-07-07 15:30:49 --- (In reply to comment #24)
(In reply to comment #23)
Created an attachment (id=22231)
--> (http://bugs.winehq.org/attachment.cgi?id=22231) [details] [details]
screenshot
Almost there. test-utf8-nobom.txt still has a problem. See screenshot.
Thanks for all your work!
This screenshot shows that Wine Notepad has opened the <utf8 -bom> file in Latin1, and WinXP has opened it in UTF-8. This is expected behaviour for Wine Notepad if there is no byte-order mark (0xef, 0xbb, 0xbf). I think WinXP Notepad should do the same as Wine, unless it has a heuristic for detecting UTF-8. When opening the <utf8 -bom> file, what encoding does WinXP Notepad select in the Open dialog? Did you override it?
It shows UTF-8. Keep in mind that utf16-le/be with no bom work fine. It's only utf-8 that's broken.
Maybe the byte-order mark was added back? If the file was saved within Notepad (either WinXP or Wine), then this would happen automatically. Try forcing Latin1 in WinXP and see whether the file starts with the characters  (i with two dots, right guillemet, inverted ?).
No, it wasn't. It's fresh from your tarball. Forcing it to open as ANSI gives: «utf8 -bom ßłøþ»
which looks a lot like wine, except for 'Ÿ'.
Perhaps you need to use the IsTextUnicode() function here...
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #26 from Alexander Scott-Johns alexander.scott.johns+winebug@googlemail.com 2009-07-07 17:06:27 --- (In reply to comment #25)
(In reply to comment #24)
[snip]
When opening the <utf8 -bom> file, what encoding does WinXP Notepad select in the Open dialog? Did you override it?
It shows UTF-8. Keep in mind that utf16-le/be with no bom work fine. It's only utf-8 that's broken.
That's interesting... so WinXP Notepad detected the file was in utf8. This is quite hard to do.
Perhaps you need to use the IsTextUnicode() function here...
This doesn't support detecting utf8, unfortunately.
http://bugs.winehq.org/show_bug.cgi?id=18191
--- Comment #27 from Dmitry Timoshkov dmitry@codeweavers.com 2009-07-09 10:42:02 --- (In reply to comment #26)
That's interesting... so WinXP Notepad detected the file was in utf8. This is quite hard to do.
MultiByteToWideChar(CP_UTF8) should report an error if input is not a proper UTF-8.
http://bugs.winehq.org/show_bug.cgi?id=18191
Bruno Jesus 00cpxxx@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |00cpxxx@gmail.com
--- Comment #28 from Bruno Jesus 00cpxxx@gmail.com 2012-04-14 12:09:49 CDT --- test-utf8-noboom.txt still shows the same incorrect text from comment 25. So this bug is still present. Or maybe it should be renamed and marked as fixed and a new bug about this specific file should be opened since there were several other things fixed here.
https://bugs.winehq.org/show_bug.cgi?id=18191
Ken Sharp imwellcushtymelike@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |source
--- Comment #29 from Ken Sharp imwellcushtymelike@gmail.com --- As above: has this become a metabug?
https://bugs.winehq.org/show_bug.cgi?id=18191
Austin English austinenglish@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Fixed by SHA1| |32c066f12abbf3864ec75b5cb6e | |9fb08c288c50b Status|NEW |RESOLVED Resolution|--- |FIXED Summary|Notepad should detect text |Notepad corrupts files |type and prompt to change |saved as utf-8 without a |saved file encoding |byte order mark (BOM) |automatically |
--- Comment #30 from Austin English austinenglish@gmail.com --- (In reply to Bruno Jesus from comment #28)
test-utf8-noboom.txt still shows the same incorrect text from comment 25. So this bug is still present.
I filed bug 38909 for that.
Or maybe it should be renamed and marked as fixed and a new bug about this specific file should be opened since there were several other things fixed here.
Sure, since utf-8 got its own bug (18192), let's make this about the issue for corrupting files without BOMs.
I've also filed bug 38910, for the original issue, that notepad should detect if there are characters in the file which cannot be represented in the target codepage.
https://bugs.winehq.org/show_bug.cgi?id=18191
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #31 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 1.7.48.