Re: Is it roundtripping or transfer-encoding

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Dec 22 2004 - 01:19:04 CST

  • Next message: Philippe Verdy: "Re: Is it roundtripping or transfer-encoding"

    From: "Doug Ewell" <dewell@adelphia.net>
    > Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
    >
    >> Unicode defines only 4 *standard* normalization forms (NFC, NFD, NFKC,
    >> NFKD), but other *non-standard* normalization forms are possible:
    >
    > But should not be used. It can be tricky enough getting the four
    > standard ones right as it is.

    Wrong. Non-standard normalization forms are useful too, and can even be safe
    if they preserve one of the two standard equivalences (canonical or
    compatibility).

    There are lots of reasons where a non-standard normalization form that still
    preserves canonical equivalence must be used (NFC and NFD are not always
    good enough because of the way combining classes are defined and the fact
    that they are immutably frozen), or because new characters have been added
    in Unicode that can't even have a useful and obvious canonical equivalence,
    due to the stability pact.

    Some transformations can't be named "normalization" under Unicode, although
    they should: for example the unification of decomposed SSANG* jamos in
    Hangul, or the removal of unnecessary occurences of CGJ in combining
    sequences. Such text transforms are considered by users as normalization,
    but Unicode sees them differently.



    This archive was generated by hypermail 2.1.5 : Wed Dec 22 2004 - 11:14:53 CST