RE: Normalisation stability, was: Compression through normalization

From: Philippe Verdy (
Date: Tue Nov 25 2003 - 18:27:36 EST

  • Next message: J Do: "Re: Unihan Vietnamese Readings"

    Rick McGowan writes:
    > John Cowan suggested...
    > > We will never come close to exceeding this limit. Essentially all new
    > > combining characters are either class 0 or fall into one of the
    > > 200-range positional classes.
    > Or 9, for viramas.

    Or 1, for overlays. Don't forget them...

    Or 7, for nuktas that will need to combine first on the base letter of an
    abjad, before applying a virama. But I do think that viramas, even in Indic
    scripts, are acting like if they were real vowels, i.e. plain letters (only

    But as soon as we encode logical interaction with viramas and nuktas, we
    break the positional model. A script that uses both fixed positional (1 or
    200+) and logical (10 to 199) combining classes appears broken for me. There
    will always be complex problems related to the possible interactions between
    nuktas/viramas/starter-combining characters and fixed positional characters.

    A script should choose between the logical encoding of its diacritics (like
    nuktas, viramas and Kana voiced sound marks), and the positional encoding of
    its diacritics (like in Latin/Greek/Cyrillic), but never attempt to mix them
    (this is not what occured on abjads in Unicode, and this is the source of
    all problems). If there are difficulties, that script should not attempt to
    use any positional system, but use only logical combining classes, or
    starter combining characters (with class 0).

    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE!

    This archive was generated by hypermail 2.1.5 : Tue Nov 25 2003 - 19:04:55 EST