Unicode, i18n support

List overview All Threads

newer

older

Generalization of async IO...

RE: IE6ALL.zip

Francois Gouget

31 Mar 2002 31 Mar '02

5:05 a.m.

Bug 98 - Unicode, i18n support is quite vague. How could I flesh it out?

For instance I am thinking that we could convert this into a metabug and have one sub bug for each control that still needs to be unicodified. So: * which are the common controls that still need to be unicodified? * are there other controls that need to be unicodified * do common dialogs need work?

As usual the goal is to make the individual tasks more visible so as to give inspiration to potential contributors, and so that we have a better idea of our progress and what remains to be done.

Another issue, which would fall more in the i18n part is BiDi support. What is the scope of this? Does it require modifications of all the controls like unicode?

http://www.winehq.com/hypermail/wine-devel/2002/03/0236.html

Shachar, it sounds like you started working on this, should I say so when I create a bugzilla task for this? Even better, can I assign the task to you? (it can always be reassigned later, it's just a more flexible way of doing things).

-- Francois Gouget fgouget@free.fr http://fgouget.free.fr/ Broadcast message : fin du monde dans cinq minutes, repentez vous !

Show replies by date

Dmitry Timoshkov

31 Mar 31 Mar

10:30 a.m.

"Francois Gouget" fgouget@free.fr wrote:

...

Bug 98 - Unicode, i18n support is quite vague. How could I flesh it out?

For instance I am thinking that we could convert this into a metabug and have one sub bug for each control that still needs to be unicodified. So:

which are the common controls that still need to be unicodified?

I believe that all standard USER controls are already coverted to unicode.

...

are there other controls that need to be unicodified

Probably controls provided by comctl32 need some additional work.

...

do common dialogs need work?

Probably.

One more thing that should be addressed IMO as the part of the unicode support in Wine: file APIs. For instance in the current state all russian file names created by windows programs are completely unreadable in Linux because they are created in code page 1251 but native russian encoding in Linux is KOI8-R (code page 20866). Therefore all file APIs should work with unicode natively, and convert file names to code page specified somewhere in the config file (like [x11drv]/TextCP) on create, and vice versa on reading file names from disk.

...

Another issue, which would fall more in the i18n part is BiDi support. What is the scope of this? Does it require modifications of all the controls like unicode?

GetCharacterPlacement and GetFontLanguageInfo should be properly implemented first. Then only minor tweaks will be required to add support for BiDi to common controls such as Edit.

-- Dmitry.

Dimitrie O. Paun

3:40 p.m.

On March 31, 2002 05:30 am, Dmitry Timoshkov wrote:

...

One more thing that should be addressed IMO as the part of the unicode support in Wine: file APIs. For instance in the current state all russian file names created by windows programs are completely unreadable in Linux because they are created in code page 1251 but native russian encoding in Linux is KOI8-R (code page 20866). Therefore all file APIs should work with unicode natively, and convert file names to code page specified somewhere in the config file (like [x11drv]/TextCP) on create, and vice versa on reading file names from disk.

But even that is changing in Linux, where filenames should use UTF8. Wouldn't that suffice?

-- Dimi.

Dmitry Timoshkov

11:08 p.m.

"Dimitrie O. Paun" dpaun@rogers.com wrote:

...

On March 31, 2002 05:30 am, Dmitry Timoshkov wrote:

...
One more thing that should be addressed IMO as the part of the unicode support in Wine: file APIs. For instance in the current state all russian file names created by windows programs are completely unreadable in Linux because they are created in code page 1251 but native russian encoding in Linux is KOI8-R (code page 20866). Therefore all file APIs should work with unicode natively, and convert file names to code page specified somewhere in the config file (like [x11drv]/TextCP) on create, and vice versa on reading file names from disk.

But even that is changing in Linux, where filenames should use UTF8. Wouldn't that suffice?

Do you mean that all syscalls that accept filename as parameter, can handle UTF8 encoded file names? Starting from what version of kernel?

I'm afraid that will help to Linux only based installations of Wine, and in any case all functions in Wine implementing file APIs should internally use unicode.

-- Dmitry.

Dimitrie O. Paun

1 Apr 1 Apr

4:27 a.m.

On March 31, 2002 06:08 pm, Dmitry Timoshkov wrote:

...

Do you mean that all syscalls that accept filename as parameter, can handle UTF8 encoded file names? Starting from what version of kernel?

Yes. I guess pretty much any kernel. From the kernel perspective, it's a zero-terminated strings of bytes. It's up to the userland to interpret it. Right now it's interpreted as a code page encoding, but the plan going forward (I believe) is to interpret it as UTF8.

...

I'm afraid that will help to Linux only based installations of Wine, and in any case all functions in Wine implementing file APIs should internally use unicode.

I think most Unices should support it. But yeah, internally we use UTF16 throughout.

-- Dimi.

Shachar Shemesh

7:07 a.m.

Dmitry Timoshkov wrote:

...

One more thing that should be addressed IMO as the part of the unicode support in Wine: file APIs. For instance in the current state all russian file names created by windows programs are completely unreadable in Linux because they are created in code page 1251 but native russian encoding in Linux is KOI8-R (code page 20866). Therefore all file APIs should work with unicode natively, and convert file names to code page specified somewhere in the config file (like [x11drv]/TextCP) on create, and vice versa on reading file names from disk.

I don't think this is a WINE issue. Depending on whether you access those files via a local mount or SMB, the translation between 1251 and 20866 should be taken care of in the filesystem or via SMB respectively. After all, I don't see any reason that WINE will be able to access these files, but regular Linux/BSD users won't.

One more thing I don't understand is this. The filenames are a list of bytes. When setting the codepage inside WINE to 1251, does it still not see them correctly?

Shachar

Dmitry Timoshkov

11:36 a.m.

"Shachar Shemesh" wine-devel@sun.consumer.org.il wrote:

...

...
One more thing that should be addressed IMO as the part of the unicode support in Wine: file APIs. For instance in the current state all russian file names created by windows programs are completely unreadable in Linux because they are created in code page 1251 but native russian encoding in Linux is KOI8-R (code page 20866). Therefore all file APIs should work with unicode natively, and convert file names to code page specified somewhere in the config file (like [x11drv]/TextCP) on create, and vice versa on reading file names from disk.

I don't think this is a WINE issue. Depending on whether you access those files via a local mount or SMB, the translation between 1251 and 20866 should be taken care of in the filesystem or via SMB respectively.

To make things worse, here is general mount options, used by *all* russian users in the world to be able correctly see cyrillic filenames under Linux from a vfat partition: mount -t vfat -o codepage=866,iocharset=koi8-r.

...

After all, I don't see any reason that WINE will be able to access these files, but regular Linux/BSD users won't.

Filenames, created by windows programs running under Wine, of course are seen correctly *under Wine*. But regular Linux applications for apparent reasons display them incorrectly, but still can open them.

...

One more thing I don't understand is this. The filenames are a list of bytes. When setting the codepage inside WINE to 1251, does it still not see them correctly?

See above.

-- Dmitry.

Shachar Shemesh

12:57 p.m.

Dmitry Timoshkov wrote:

...

Filenames, created by windows programs running under Wine, of course are seen correctly *under Wine*. But regular Linux applications for apparent reasons display them incorrectly, but still can open them.

...
One more thing I don't understand is this. The filenames are a list of bytes. When setting the codepage inside WINE to 1251, does it still not see them correctly?

See above.

I think what you have now is the best solution you can opt for.

If I understood correctly, WINE uses 1251, but Linux the 20866 (you'll have to excuse me, Russian was, in fact, on my "todo" list to learn, but I never got around to it). The reason I think this is the best move is because this way WINE is compatible with Windows, and creates no more serious compatibility problems with Windows than Windows does.

I would particularily recommend against switching WINE to 20866 in light of the fact that the only text editor I know that supports UNICODE is Notepad. Most applications are not UNICODE, and running those apps on WINE would become impossible if WINE did not use the same codepage as Windows. You don't know how many buggy apps are out there assuming things they shouldn't about things.

In my history I've had one application that stored binary stuff in the registry under the "string" type. All went well until my program stored that data intermediatly as UNICODE, using a Japanese (MBCS) locale. Some byte combinations produced illegal characters, and were translated back to "?", and the app would break. Moral - if compatibility is what your'e after, be compatible. Don't rely on apps to behave themselves.

Shachar

Dmitry Timoshkov

1:09 p.m.

"Shachar Shemesh" wine-devel@sun.consumer.org.il wrote:

...

I would particularily recommend against switching WINE to 20866 in light of the fact that the only text editor I know that supports UNICODE is Notepad. Most applications are not UNICODE, and running those apps on WINE would become impossible if WINE did not use the same codepage as Windows. You don't know how many buggy apps are out there assuming things they shouldn't about things.

Probably I didn't make my position clear enough and somehow you misunderstood me and decided that I propose to use code page 20866 instead of 1251 for russian locale under Wine. But that's not the case. I only proposed to create filenames on disk in encoding used natively in Linux.

-- Dmitry.

Shachar Shemesh

2:49 p.m.

Dmitry Timoshkov wrote:

...

Probably I didn't make my position clear enough and somehow you misunderstood me and decided that I propose to use code page 20866 instead of 1251 for russian locale under Wine. But that's not the case. I only proposed to create filenames on disk in encoding used natively in Linux.

Will Windows then be able to read them in Russian? If not, I think having compatibility problems is not worth it.

Dmitry Timoshkov

2:04 p.m.

"Shachar Shemesh" wine-devel@sun.consumer.org.il wrote:

...

...
Probably I didn't make my position clear enough and somehow you misunderstood me and decided that I propose to use code page 20866 instead of 1251 for russian locale under Wine. But that's not the case. I only proposed to create filenames on disk in encoding used natively in Linux.

Will Windows then be able to read them in Russian?

Sure, thanks to magic mounting options I mentioned earlier. Native Linux localized applications know nothing about anything other than KOI8-R, and filenames created by them on a vfat partition are readable under Windows just fine.

...

If not, I think having compatibility problems is not worth it.

Yes, I also think so.

-- Dmitry.

Waldek Hebisch

2 Apr 2 Apr

2:19 p.m.

...

misunderstood me and decided that I propose to use code page 20866 instead of 1251 for russian locale under Wine. But that's not the case. I only proposed to create filenames on disk in encoding used natively in Linux.

I think that the right way is to have code conversion option in wine.config, as one of Wine mount options. That way we will be able to handle most weird configuration (like UTF-8 on native Linux, cp 1251 on removable media).

-- Waldek Hebisch hebisch@math.uni.wroc.pl or hebisch@hera.math.uni.wroc.pl

Dmitry Timoshkov

2:42 p.m.

"Waldek Hebisch" hebisch@math.uni.wroc.pl wrote:

...

I think that the right way is to have code conversion option in wine.config, as one of Wine mount options. That way we will be able to handle most weird configuration (like UTF-8 on native Linux, cp 1251 on removable media).

Wonderful. I like this idea. Thanks!

-- Dmitry.

Francois Gouget

10:20 p.m.

On Tue, 2 Apr 2002, Waldek Hebisch wrote:

...

...
misunderstood me and decided that I propose to use code page 20866 instead of 1251 for russian locale under Wine. But that's not the case. I only proposed to create filenames on disk in encoding used natively in Linux.

I think that the right way is to have code conversion option in wine.config, as one of Wine mount options. That way we will be able to handle most weird configuration (like UTF-8 on native Linux, cp 1251 on removable media).

So if I understand correctly, Linux does not provide a uniform interface to the filesystem. I.e. if I do 'touch ~/foo' where foo contains weird characters I must make sure these are the right characters for the codepage used by ~, and then if I do 'touch /mnt/win98/foo', then I must change 'foo' so that its characters now match the 1251 codepage, and I may have to rewrite foo yet again for 'touch /zipdrive/foo'.

Urgh. This is certainly ugly. I thought that Linux would be taking UTF-8 or something like it for all filesystems and then do the codepage conversions itself depending on the underlying filesystem. I thought that this was the point of having all the codepage information in the kernel for fat filesystems.

-- Francois Gouget fgouget@free.fr http://fgouget.free.fr/ "Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ;)" -- Linus Torvalds

Waldek Hebisch

3 Apr 3 Apr

1:54 p.m.

Francois Gouget wrote:

...

So if I understand correctly, Linux does not provide a uniform interface to the filesystem. I.e. if I do 'touch ~/foo' where foo contains weird characters I must make sure these are the right characters for the codepage used by ~, and then if I do 'touch /mnt/win98/foo', then I must change 'foo' so that its characters now match the 1251 codepage, and I may have to rewrite foo yet again for 'touch /zipdrive/foo'.

Urgh. This is certainly ugly. I thought that Linux would be taking UTF-8 or something like it for all filesystems and then do the codepage conversions itself depending on the underlying filesystem. I thought that this was the point of having all the codepage information in the kernel for fat filesystems.

Native Unix point of view is that filename is string, and what it means depends on userspace programs (presentation). In Linux one can easily change encoding used for presentation, in Poland typical is ISO8859-2. If user types a name which contains national characters such name is stored verbatim on disk. If retrived later using different encoding it may appear garbled. Typical Linux installation will choose "preffered" encoding and set up things so that encoding works well. In particular codepages in kernel and mount options allows to translate names on fat filesystem form (to) "preffered" encoding.

When I wrote about weird setup I mean that technically it is possible to use different encodings in different filesystems, and I can imagine various scenarios that do this (basicaly to work with software that insists on specific encoding).

I belive that UTF-8 is the way to go, but it is still the future -- last year trying to make UTF-8 system I found that I need a bunch of programs which cannot work with UTF-8 (they work well with any 8-bit encoding)

-- Waldek Hebisch hebisch@math.uni.wroc.pl or hebisch@hera.math.uni.wroc.pl

Shachar Shemesh

31 Mar 31 Mar

12:35 p.m.

Francois Gouget wrote:

...

Bug 98 - Unicode, i18n support is quite vague. How could I flesh it out?

For instance I am thinking that we could convert this into a metabug and have one sub bug for each control that still needs to be unicodified. So:

which are the common controls that still need to be unicodified?

are there other controls that need to be unicodified

do common dialogs need work?

As usual the goal is to make the individual tasks more visible so as

to give inspiration to potential contributors, and so that we have a better idea of our progress and what remains to be done.

Another issue, which would fall more in the i18n part is BiDi support. What is the scope of this? Does it require modifications of all the controls like unicode?

http://www.winehq.com/hypermail/wine-devel/2002/03/0236.html

Shachar, it sounds like you started working on this, should I say so when I create a bugzilla task for this? Even better, can I assign the task to you? (it can always be reassigned later, it's just a more flexible way of doing things).

Sure, assign it away.

In general, I am not sure what my exact role is going to be, as it seems IBM may be picking up this glove, and they have a month and a half and two men fore on me (not to mention full time vs. off hours).

If you do assign it, use "winebugzilla.at.sun.consumer.org.il" as my email address, please.

Shachar

...

-- Francois Gouget fgouget@free.fr http://fgouget.free.fr/ Broadcast message : fin du monde dans cinq minutes, repentez vous !

8499

Age (days ago)

8502

Last active (days ago)

wine-devel@winehq.org

15 comments

5 participants

tags (0)

participants (5)

Dimitrie O. Paun
Dmitry Timoshkov
Francois Gouget
Shachar Shemesh
Waldek Hebisch