Francois Gouget wrote:
So if I understand correctly, Linux does not provide a uniform interface to the filesystem. I.e. if I do 'touch ~/foo' where foo contains weird characters I must make sure these are the right characters for the codepage used by ~, and then if I do 'touch /mnt/win98/foo', then I must change 'foo' so that its characters now match the 1251 codepage, and I may have to rewrite foo yet again for 'touch /zipdrive/foo'.
Urgh. This is certainly ugly. I thought that Linux would be taking UTF-8 or something like it for all filesystems and then do the codepage conversions itself depending on the underlying filesystem. I thought that this was the point of having all the codepage information in the kernel for fat filesystems.
Native Unix point of view is that filename is string, and what it means depends on userspace programs (presentation). In Linux one can easily change encoding used for presentation, in Poland typical is ISO8859-2. If user types a name which contains national characters such name is stored verbatim on disk. If retrived later using different encoding it may appear garbled. Typical Linux installation will choose "preffered" encoding and set up things so that encoding works well. In particular codepages in kernel and mount options allows to translate names on fat filesystem form (to) "preffered" encoding.
When I wrote about weird setup I mean that technically it is possible to use different encodings in different filesystems, and I can imagine various scenarios that do this (basicaly to work with software that insists on specific encoding).
I belive that UTF-8 is the way to go, but it is still the future -- last year trying to make UTF-8 system I found that I need a bunch of programs which cannot work with UTF-8 (they work well with any 8-bit encoding)