VOWEL, CONSONANT, ...: allow recognition of shorter names?

From: Henrik Theiling (ht@theiling.de)
Date: Fri Apr 11 2008 - 04:38:30 CDT

  • Next message: Otto Stolz: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"


    TR#34 states that all character and sequence names (except one pair
    involving HANGUL JUNGSEONG O-E) will always be unique when space,
    medial dash and the words LETTER, CHARACTER, and DIGIT are ignored.

    When writing a character name recognition algorithm, I would like to
    let the user be as concise as possible, yet without violating Unicode
    rules, and without being in potential conflict with upcoming versions
    of Unicode. As I understand it, the rule that LETTER, CHARACTER,
    DIGIT, spaces, medial dash can be ignored in comparision try to
    address this very idea.

    I noticed that for some scripts, e.g. Khmer, character names are still
    a mouthful. I also noticed that when I additionally ignored
    CONSONANT, VOWEL, and INDEPENDENT, the Unicode names are still unique
    and it would improve writing (at least) Khmer character names a lot.

    I was wondering whether it would be feasible to tighten the condition
    in TR#34 so that no upcoming Unicode versions had ambiguous names if
    CONSONANT, VOWEL, and INDEPENDENT were ignored, too.

    Of course, there may be more ignorable words, so the question is where
    to stop. 'VOWEL' is in 360 words, which is more than 'CHARACTER',
    which is in only 106. But CONSONANT and INDEPENDENT are relatively
    seldom. Here are a few other words that occur very frequently that
    can currently be ignored without ambiguity:

        VOWEL in 360 names
        CONSONANT in 66 names
        INDEPENDENT in 19 names (seldom, but also a mouthful)
        SYLLABICS in 630 names
        LIGATURE in 508 names
        FORM in 798 names
        PATTERN in 297 names

    For stability reasons, it would be very nice if we knew that upcoming
    Unicode versions had the same nice unambiguity, because then I could
    officially ignore those words so my users could enjoy more concise
    character names.


    This archive was generated by hypermail 2.1.5 : Fri Apr 11 2008 - 04:41:42 CDT