Normalization Form KC for Linux

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Wed Aug 18 1999 - 04:41:23 EDT


I was never too happy with the UCS implementation levels, and after
reading Unicode Tech Report #15, I think I have now seen the light and I
have just added in

  http://www.cl.cam.ac.uk/~mgk25/unicode.html

in section "How should Unicode be used under Linux?" the following
paragraph:

  One day, combining characters will surely be supported under Linux, but
  even then the precomposed characters should be preferred over combining
  character sequences where available. More formally, the preferred way of
  encoding text in Unicode under Linux should be Normalization Form KC as
  defined in Unicode Technical Report #15
  <http://www.unicode.org/unicode/reports/tr15/>.

I hope this recommendation meets general approval. I would even suggest
that programs such as less and ls could be extended to replace
characters on output by \xx hex escape sequences if they find in file
names or text files characters that are not conforming to Normalization
Form KC, such that these potential trouble-makers can be spotted more
easily by users.

It might be a very nice idea to have all the Unicode Normalization forms
added to GNU recode or iconv.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT