Re: unicode on Linux

From: Stephane Bortzmeyer (
Date: Tue Oct 21 2003 - 06:43:43 CST

On Mon, Oct 20, 2003 at 10:14:22PM +0200,
 Stefan Persson <> wrote
 a message of 23 lines which said:

> >Just wondering if anybody knowss how unicode is on Linux?
> >
> Very good support.

Very optimistic.


1) File names in Unicode: no (well, the Linux kernel is 8-bits clean
so you can always encode in UTF-8, but the kernel does not do any
normalization and the applications do not expect UTF-8, for instance
ls sorts alphabetically but dot not know Unicode sorting).

2) User names: worse since utilities to create an account refuses


3) grep: no Unicode regexp

4) xterm (or similar virtual terminals): No BiDi support at all

5) shells: I'm not aware of any line-editing shell (zsh, tcsh)
that have Unicode character semantics (back-character should move one
character, not one byte)

6) databases: I'm not aware of a free DBMS which has support for
Unicode sorting (SQL's ORDER BY) or regexps (SQL's LIKE).

7) Serious word processing: LaTeX has only very minimum Unicode

Also, many applications (exmh, emacs) are ten times slower when
running in UTF-8 mode.

At the present time, using Unicode on Unix is an act of faith.

> Default charset for recent versions of some popular distributions.

Yes, RedHat changed the default charset to Unicode without thinking
that text files were no longer readable.


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST