Re: Interleaved collation of related scripts

From: Philippe Verdy (
Date: Fri May 14 2004 - 12:10:38 CDT

  • Next message: "Unicode fallback font"

    From: "Michael Everson" <>
    > At 06:35 -0700 2004-05-14, Peter Kirk wrote:
    > >But there is an exceptional issue within the family of north-west
    > >Semitic scripts, which may apply also to others e.g. Greek, Coptic
    > >and archaic Greek - possibly also the Indic scripts.
    > I don't think so.
    > >Within these sets of scripts there is NO ambiguity about which
    > >characters correspond to which, as they have identical repertoires,
    > >with possibly additional letters in some of the scripts for which no
    > >equivalent can be defined in the other scripts.
    > That doesn't mean that an ordered list with them interfiled is in any
    > way legible.

    I do agree. UCA is first built to produce legible and consistent ordering for
    various kinds of readers, both experts or simple users that can only read one
    language or one script. We can interleave some variants that have an obvious
    relation with other wellknown characters (accented letters are good examples,
    even if some may wonder why there are thorn lettern between T and U; these
    letters being more rare even in the languages that use them, this inreleaving of
    variants does not make the ordering completely unreadable).

    For search purposes, what some want is not really a collation order but
    equivalence relations. This belongs to the same need as case folding, or case
    insensitive searches.

    I see no opposition in adding new types of string folding, for those that would
    like to "fold" (in fact transliterate) Phenician to Hebrew (the reverse being
    hard to implement consistently due to the various sets of Hebrew diacritics), or
    to Greek. There can even exist some standard guideline to implement such folding
    or transliteration (for the same reason that there does exist standard folding
    rules for case in Latin/Greek/Cyrillic or for Katagana-to-Hiragana in Japanese).
    Such folding belongs to the same area, with the same caveats (in terms of text
    interpretation), as custom normalizations or compatibility normalizations
    performed on unknown input text: a linguistic semantic is lost.

    This archive was generated by hypermail 2.1.5 : Fri May 14 2004 - 12:11:13 CDT