Re: Major Defect in Combining Classes of Tibetan Vowels

From: Christopher John Fynn (cfynn@gmx.net)
Date: Fri Jun 27 2003 - 16:37:32 EDT

  • Next message: Philippe Verdy: "Re: Biblical Hebrew"

    Rick McGowan <rick@unicode.org> has privately suggested moving
    the discussion of Combining Classes of *Tibetan* Characters
    from the main Unicode list unicode@unicode.org to the TIBEX list
    tibex@unicode.org - an "experts" list which was set up several
    years ago specifically to discuss proposals for encoding Tibetan
    characters in Unicode. If there are people who have a
    particular interest in Tibetan characters and have been
    following the thread here who would like to continue following
    this thread - perhaps they could ask Rick how they can join that
    list.

    I'll follow Rick's advice - perhaps this discussion is more
    appropriate on the TIBEX list - even though similar issues with
    some Hebrew characters which have been raised here (again) as a
    result of this thread makes me think there may be a need for a
    non script specific solution or work-around to problems with
    cannoical combining class values.

    Anyway I'm going to move this discussion over there with a
    parting shot...

    Off-list Robert Chilton has pointed out to me the following:

    > 3. A very common occasion of 0F7E occurring with a vowel is in
    the stack
    > HaUm (orthographic sequence of 0F67 0F71 0F74 0F7E). Because
    0F7E is
    > currently assigned a cc of zero, this *same glyph-form* could
    > theoretically be encoded with a total of 6 different character
    > sequences, resulting in 4(!) different sequences following
    > normalization. Properly, all 6 sequences should normalize to
    the same
    > sequence -- which is indeed the case if 0F82 or 0F83 is used
    in place of
    > 0F7E. Obviously a major problem, not only for rendering but
    also for
    > searching and sorting.

    FOUR different sequences possible *after* "normalisation" ???

    Personally I would have rather seen all Tibetan characters
    having a CCV of 0 (and all pre-combined Tibetan characters
    *strongly* depreciated)rather than this. If someone simply
    follows the normal rules for writing Tibetan, then characters
    will be entered in a very predictable order which is far easier
    to process than the one(s) they can end up in after Unicode
    "normalisation".

    - Chris Fynn

    BTW My apologies to anyone who receives two copies of this
    message.



    This archive was generated by hypermail 2.1.5 : Fri Jun 27 2003 - 17:14:43 EDT