Hans Leidekker wrote:
On Sat, 17 May 2003, Shachar Shemesh wrote:
No, they are in whatever locale the string is. In particular, the entire keyboard code is filled to the brim with strings, each with a different locale. I'm talking about functional code here, not something which is only inside comments.
I know Wine sources are not declared as adhering to any particular character set, but when I display them using ISO_8859-1 I see the least distortions. That's why I said "it looks like" they are ISO_8859-1.
That's because people with names outside of the 8859-1 charset rarely assume that any client will be able to read their name, and write it in latin (Japanese call it "Romanji") letters. European names, on the other hand, rarely have pure-latin transcripts, because the letters are too similar. Irony.
No can do ASCII. A hebrew "שלו×" will not look good, or at all, for that matter, in ASCII.
As your locale is UTF-8, you made my string twice as long `-)
That's obvious. Hebrew won't look good in ISO_8859-1 either.
No, but it will, at least, be preserved. Not critical to comments, but is critical to non-lating strings.
Then, like I said, your option is to "escape" characters outside ASCII-7, like Germans do with their umlauts.
Care to show what you mean?
If that Hebrew string you presented is your name,
Nah, far too long for that. My name is just three letters. Get the full story at http://www.shemesh.biz/sun.html.
then "Shachar" could be seen as an escaped ASCII-7 notation for it, couldn't it?
If you mean that instead of writing "שחר", I should write "\xfa\xe8\xf9", then I think you are talking non-practical solutions here. It took me less then a second to write the native version - I just typed it. It took me almost a minute to write the escaped version, and I can only speculate as to whether I got it right. I just redid it, because I have, in fact, not got it right. What CJK people are expected to do is not something I would like to contemplate. In addition to that, noone, not even Hebrew speakers, can be reasonably expected to understand what is written there. That is a majour source for problems.
Having said that, there is one place I did exactly this in the Wine sources. In dlls/commdlg/font.c, you can find, near the begining of the file, a table of the characters that the font dialog should display for the corresponding locale. The enteries in that table are in UTF-16, as I couldn't make each string of a different locale. As a result, they are, indeed, unreadable. As this is not a true string, but simply a few character to demo a font, I'm hoping it will not matter much.
UTF-8 may work for resources, if the resource compiler is adjusted accordingly, but not inside the code, where the encoding actually matters for the code that parses it.
- Set character set to "C" or "ISO8859-1" prior to
running perl on the sources
That sounds better, I think... What does perl do with the sources again?
By Perl I in fact mean any Wine tool that's written in Perl. Mostly running regexps on the sources is what they do I guess.
Then I vote for this. 8859-1 will not distort the sources, which is all that is really required.
Plus you have not solved the functional strings problem.
What do you mean by "functional strings"?
I mean strings that actually perform some function, as opposed to comments. The most prominant example is keyboard.c, where each string is of a different encoding. The code at fontdlg.c is also an example.
-Hans
Much thought I like UTF-8, I think it is totally and utterly inapropriate for handling the Wine code. Like it or not, MS chose UTF-16 (actually, they chose UCS-2, and then made it UTF-16 when it was invented, IIRC), and that's what Wine must choose as well. Given that fact, it makes no sense to have strings inside Wine in UTF-8, as that would require runtime convertions. If the strings are not UTF-8, there is no reason to make the comments so.
Shachar