Re: Normalization Form KC for Linux

From: Mark Davis (mark@macchiato.com)
Date: Wed Aug 18 1999 - 10:03:47 EDT


BTW, http://www.cl.cam.ac.uk/~mgk25/unicode.html is missing a charset setting.
I noticed this because I had my browser set on utf-8, and the character
A-umlaut showed up as G-dot_above. I'd also recomment trying out the HTML
validator at http://validator.w3.org/: I've been slowly working through my web
pages to fix HTML problems that various editors left in.

Mark(us)

Markus Kuhn wrote:

> I was never too happy with the UCS implementation levels, and after
> reading Unicode Tech Report #15, I think I have now seen the light and I
> have just added in
>
> http://www.cl.cam.ac.uk/~mgk25/unicode.html
>
> in section "How should Unicode be used under Linux?" the following
> paragraph:
>
> One day, combining characters will surely be supported under Linux, but
> even then the precomposed characters should be preferred over combining
> character sequences where available. More formally, the preferred way of
> encoding text in Unicode under Linux should be Normalization Form KC as
> defined in Unicode Technical Report #15
> <http://www.unicode.org/unicode/reports/tr15/>.
>
> I hope this recommendation meets general approval. I would even suggest
> that programs such as less and ls could be extended to replace
> characters on output by \xx hex escape sequences if they find in file
> names or text files characters that are not conforming to Normalization
> Form KC, such that these potential trouble-makers can be spotted more
> easily by users.
>
> It might be a very nice idea to have all the Unicode Normalization forms
> added to GNU recode or iconv.
>
> Markus
>
> --
> Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
> Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT