Re: Normalization Form KC for Linux

From: Mark Davis (
Date: Wed Aug 18 1999 - 10:03:47 EDT

BTW, is missing a charset setting.
I noticed this because I had my browser set on utf-8, and the character
A-umlaut showed up as G-dot_above. I'd also recomment trying out the HTML
validator at I've been slowly working through my web
pages to fix HTML problems that various editors left in.


Markus Kuhn wrote:

> I was never too happy with the UCS implementation levels, and after
> reading Unicode Tech Report #15, I think I have now seen the light and I
> have just added in
> in section "How should Unicode be used under Linux?" the following
> paragraph:
> One day, combining characters will surely be supported under Linux, but
> even then the precomposed characters should be preferred over combining
> character sequences where available. More formally, the preferred way of
> encoding text in Unicode under Linux should be Normalization Form KC as
> defined in Unicode Technical Report #15
> <>.
> I hope this recommendation meets general approval. I would even suggest
> that programs such as less and ls could be extended to replace
> characters on output by \xx hex escape sequences if they find in file
> names or text files characters that are not conforming to Normalization
> Form KC, such that these potential trouble-makers can be spotted more
> easily by users.
> It might be a very nice idea to have all the Unicode Normalization forms
> added to GNU recode or iconv.
> Markus
> --
> Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
> Email: mkuhn at, WWW: <>

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT