Re: Unicode, i18n support

3 Apr 2002


      Francois Gouget wrote:
...
So if I understand correctly, Linux does not provide a uniform
interface to the filesystem. I.e. if I do 'touch ~/foo' where foo
contains weird characters I must make sure these are the right
characters for the codepage used by ~, and then if I do 'touch
/mnt/win98/foo', then I must change 'foo' so that its characters now
match the 1251 codepage, and I may have to rewrite foo yet again for
'touch /zipdrive/foo'.
Urgh. This is certainly ugly. I thought that Linux would be taking
UTF-8 or something like it for all filesystems and then do the codepage
conversions itself depending on the underlying filesystem. I thought
that this was the point of having all the codepage information in the
kernel for fat filesystems.
Native Unix point of view is that filename is string, and what it 
means depends on userspace programs (presentation). In Linux one 
can easily change encoding used for presentation, in Poland typical 
is ISO8859-2. If user types a name which contains national characters
such name is stored verbatim on disk. If retrived later using different
encoding it may appear garbled. Typical Linux installation will choose 
"preffered" encoding and set up things so that encoding works well. 
In particular codepages in kernel and mount options allows to translate
names on fat filesystem form (to) "preffered" encoding.
When I wrote about weird setup I mean that technically it is possible 
to use different encodings in different filesystems, and I can imagine
various scenarios that do this (basicaly to work with software that 
insists on specific encoding).
I belive that UTF-8 is the way to go, but it is still the future --
last year trying to make UTF-8 system I found that I need a bunch of 
programs which cannot work with UTF-8 (they work well with any 8-bit
encoding)
-- 
                              Waldek Hebisch
hebisch@math.uni.wroc.pl    or hebisch@hera.math.uni.wroc.pl

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: Unicode, i18n support