Re: Linux and UTF8 filenames

From: Jungshik Shin (jshin@mailaps.org)
Date: Mon Sep 16 2002 - 08:10:19 EDT


On Mon, 16 Sep 2002, Martin Kochanski wrote:

> 'ls') will do. Is it universally the case that the tools will assume
> that those byte-sequence filenames are in UTF8 (....
> ...........)? Or do they assume a standard locale
> (...............................)? Or is this a switchable option that
> the user can set?

   You can just set the locale to UTF-8 locale and all files names
are treated as in UTF-8 by properly I18Nized tools like
ls(in case of shell tools like 'ls', you need to have a UTF-8
terminal. There are a few UTF-8-capable terminals including
xterm). Nowadays, most Linux distribution come with tens of
UTF-8 locales ( e.g. en_US.UTF-8, ja_JP.UTF-8, ko_KR.UTF-8,
fr_CA.UTF-8, etc). For more information on Linux and UTF-8, refer to
<http://www.cl.cam.ac.uk/~mgk25/unicode.html>. There's also a mailing
list dedicated to the topic (linux-utf8) mentioned in the FAQ.

> In any case, how can a poor innocent server discover
> enough about the context in which it is running to know what filename
> it has to use so that a user who lists a file directory will see "...."
> on his screen?

  As you wrote, Linux(and most Unix) filesystems don't have any
information about the encoding used for file names, but the way they're
interpreted can be easily switchable by changing the locale.
Because it seems like you begin buidling your server from the scratch,
you don't have to worry about converting old filenames in various
legacy encodings to UTF-8 (not that it's hard to do) and can just
make all filenames in UTF-8 and always use one of UTF-8 locales.

  Jungshik



This archive was generated by hypermail 2.1.2 : Mon Sep 16 2002 - 08:55:51 EDT