Re: unicode on Linux

From: Stephane Bortzmeyer (bortzmeyer@nic.fr)
Date: Thu Oct 23 2003 - 02:47:46 CST


On Tue, Oct 21, 2003 at 11:32:28AM -0400,
 Edward H. Trager <ehtrager@umich.edu> wrote
 a message of 118 lines which said:

> I think there can be big debates about whether a Linux (or any *nix
> kernel, for that matter) has any business normalizing file names.
> Personally I think Unicode normalization is not the kernel's
> business. This is better left to the userland applications.

I do not agree. It would mean *each* application has to normalize
because it cannot rely on the kernel. It has huge security
implications (two file names with the same name in NFC, so visually
impossible to distinguish, but two different string of code points).

Normalization has to be done in the kernel for the same reason than
access control (the rwx bits in Unix) has to be in the kernel: so that
no application can bypass it.

> Are you sure about ls? ls should sort UTF-8-encoded file names in
> raw Unicode order, n'est-ce pas?

Yes, but this has no meaning (in French, é should not be after z).

> What about ICU's regexp package?
> (http://oss.software.ibm.com/icu/userguide/regexp.html) You should
> be able to use ICU on *any* platform. Linux does not yet having a
> Unicode grep

I never said that Unix cannot be "Unicodized". I just saif that it is
not Unicodized. That's why I talked about an "act of faith". You need
to configure many things and to compile many things before you have a
working Unicode environment.

> I thought both Postgres and MySQL already have, or are working on
> this issue?

None of them have it. They claim "Unicode support" which means they
can just store and retrieve UTF-8.
 



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST