Re: Normalization Form KC for Linux

From: Juliusz Chroboczek (
Date: Fri Aug 27 1999 - 14:52:54 EDT

Rick McGowan <>:

>> More formally, the preferred way of encoding text in Unicode under
>> Linux should be Normalization Form KC as defined in Unicode
>> Technical Report #15

RM> Gosh, I don't approve. And I've been using Unix systems for many
RM> years. The most flexible kind of implementation would prefer
RM> decomposed sequences. In any case, enlightened systems would
RM> accept anything and massage as needed to fit the particular
RM> application instead of forcing (or "suggesting") the user to run
RM> everything through the meat grinder first...

As I understand it, Markus was speaking about the interchange formats,
including, but not limited to, file formats and IPC formats. It is
expected that simple applications will only be able to accept
precomposed forms, while enlightened ones (I like the term) will
accept anything. Therefore, requesting that applications *write*
precomposed forms in preference to combining characters maximises the
chances of interchange between simple and complex applications.
Complex applications are still expected to accept arbitrary combining
characters; they just should avoid producing them whenever possible.

(The question of unification of compatibility forms -- C vs. KC -- is
a different issue altogether; not one I would dare to claim that I am
even vaguely not totally incompetent to have an opinion on.)

RM> In any case, I think Unix community tends in general to be very
RM> very confused about the distinction between how data exists in
RM> storage and what appears on one's screen/window/emulator.

While to a certain extent true of the Unix-like community in general,
this is not a fair assessment of Markus' work.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT