Re: [hebrew] Re: Hebrew combining classes (was ISO 10646 compliance and EU law)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Jan 16 2005 - 19:11:04 CST

  • Next message: Peter Kirk: "Re: [hebrew] Re: Hebrew combining classes (was ISO 10646 compliance and EU law)"

    From: "Doug Ewell" <dewell@adelphia.net>
    > Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
    >
    >>> Elaine, the good news for you is that if you order your Unicode
    >>> Hebrew text according to these 'alternative combining classes' you
    >>> will not be deviating at all from the Unicode standard. Your text
    >>> will not be normalised in any of the standard normalisation forms,
    >>> but the standard nowhere specifies that texts must be normalised. Of
    >>> course you need to ensure that your text is not normalised by other
    >>> processes, or that if it is you then restore it to the order of the
    >>> 'alternative combining classes' - a process which should be
    >>> reversible.
    >>
    >> Note that you can't define "alternative combining classes" the way you
    >> want, if you need to preserve canonincal equivalence.
    >
    > Isn't that what Peter said? If you don't care about standard
    > normalization forms, you don't care about canonical equivalence.

    This was a different point. it just spoke about alternate normalization
    forms, but in my opinion, a transformation based on "alternative combining
    classes" should not be named "normalization" if it does not preserve
    canonical equivalence.

    My opinion is just weakened by the fact that Unicode also speaks about
    "normalization" when refering to NFKC and NFKD forms, despite they don't
    preserve the canonical equivalence.

    But a normalization thart uses alternate combining class values but
    preserves the partition of characters in their combining class, will
    preserve the canonical equivalence and can be named "normalization" alone,
    but probably "denormalization". I don't say that such process will not be
    useful. In fact there already exists such transforms that are part of other
    standards:

    NFC and NFD forms are not extremely useful, including for collation, or even
    for rendering. They only suit the need for compatibility with non-Unicode
    standards that can't compose/decompose characters themselves. But modern
    text algorithms texts should be able to process equally every input text in
    any canonically equivalent form, whever it is normalized or not. So NFC and
    NFD forms, as well as existing combining classes are just there to specify
    those equivalent texts that should be treated equally. But processes will
    often need to perform more to recognize texts (using a NFC/D normalization
    on input before applying other denormalizations will force these processes
    to become conformant to Unicode, but the two operations can be joined into a
    single one).



    This archive was generated by hypermail 2.1.5 : Mon Jan 17 2005 - 11:27:27 CST