From: Marcin 'Qrczak' Kowalczyk (email@example.com)
Date: Wed Dec 15 2004 - 08:06:11 CST
"Arcane Jill" <firstname.lastname@example.org> writes:
> Unix makes is possible for /you/ to change /your/ locale - but by
> your reasoning, this is an error, unless all other users do so
Not necessarily: you can change the locale as long as it uses the same
By "error" I mean "a bad idea". The system does not prevent from
changing the locale to a different encoding. But then you are on your
own and various things can break: terminal output will be mangled, you
can't enter characters used in a different encoding from the keyboard,
text files will be illegible, and Unicode programs which process texts
may reject your data or even filenames. If you still need to change
encodings, it's safer to use ASCII-only filenames.
This situation is temporary. Well, it may last 10 more years or so,
but it will probably gradually improve:
First, more protocols and file formats are becoming aware of character
encodings and either label them explicitly or use a known encoding
(generally some Unicode encoding scheme). Especially protocols for
data interchange over Internet: WWW, email, usenet, modern instant
messaging protocols like Jabber. Some old protocols remain
encoding-ignorant, e.g. irc and finger. GNOME 1 used the locale
encoding, GNOME 2 uses UTF-8. Copying & pasting text in X window now
has a separate API which uses UTF-8. While the irc protocol doesn't
specify the encoding, the irssi client can now recode texts itself
to conform to customs of particular channels.
Second, UTF-8 is becoming more usable as the default encoding
specified by the locale. I don't use it now because too many things
still break, but it's improving: there are things which didn't work
just a few years ago and work now. Terminal emulators in X widely
support UTF-8 mode now. The curses library now has a working wide
character API. Emacs and vi work in UTF-8 (Emacs still has problems).
Readline now works in UTF-8. Localized messages (gettext) are now
Other programs still don't work. Bash works, while zsh and ksh don't.
Most full-screen text programs use the narrow character curses API and
don't work in UTF-8. Brokenness of interactive interpreters of various
BTW, in the wide character curses API, the only way curses can work
in a UTF-8 terminal, characters are expressed as sequences of wchar_t
(base char + some combining chars, possibly double width). Which means
that you must somehow translate filenames to this representation
in order to display them - same as with a Unicode-based GUI. It's
meaningless to render arbitrary bytes on the terminal, and you can't
force curses to emit the original byte sequences which form filenames
(which would be a bad idea for control characters anyway). By
legimitizing non-UTF-8 filenames in a UTF-8 system you increase
problems to overcome by such applications: not only they have to
show control characters somehow, but also invalid UTF-8.
> But it goes beyond that. Copy a file onto a floppy disc and then
> physically take that floppy disc to a different Unix machine and log
> on as "guest" and insert the disc ... Will the filename look the same?
Depends on the filesystem and the way it is mounted.
For example if it's FAT with long filenames (which I think is the
usual format for floppies even on Unix), filenames can be recoded by
the kernel: you specify the encoding to present filenames in and the
encoding of short names. I don't know what happens with filenames
which are not expressible in the selected encoding.
In this way filenames may automatically convert between systems which
use different default encodings, preserving the character semantics
rather than the byte representation. Of course file contents will not
-- __("< Marcin Kowalczyk \__/ email@example.com ^^ http://qrnik.knm.org.pl/~qrczak/
This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 08:13:33 CST