From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Dec 22 2004 - 01:19:04 CST
From: "Doug Ewell" <dewell@adelphia.net>
> Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>
>> Unicode defines only 4 *standard* normalization forms (NFC, NFD, NFKC,
>> NFKD), but other *non-standard* normalization forms are possible:
>
> But should not be used. It can be tricky enough getting the four
> standard ones right as it is.
Wrong. Non-standard normalization forms are useful too, and can even be safe
if they preserve one of the two standard equivalences (canonical or
compatibility).
There are lots of reasons where a non-standard normalization form that still
preserves canonical equivalence must be used (NFC and NFD are not always
good enough because of the way combining classes are defined and the fact
that they are immutably frozen), or because new characters have been added
in Unicode that can't even have a useful and obvious canonical equivalence,
due to the stability pact.
Some transformations can't be named "normalization" under Unicode, although
they should: for example the unification of decomposed SSANG* jamos in
Hangul, or the removal of unnecessary occurences of CGJ in combining
sequences. Such text transforms are considered by users as normalization,
but Unicode sees them differently.
This archive was generated by hypermail 2.1.5 : Wed Dec 22 2004 - 11:14:53 CST