Unique character names

From: Doug Ewell (dewell@roadrunner.com)
Date: Sat Dec 01 2007 - 12:17:40 CST

  • Next message: David Starner: "Re: UAX #14: no line breaks between OP and QU, even if there are intervening spaces"

    UAX #34 has the following to say about unique naming of character
    sequences, and by extension, characters:

    > R3: Like character names, names for sequences are unique if they are
    > different even when SPACE and medial HYPHEN-MINUS characters are
    > ignored, and when the strings “LETTER”, “CHARACTER”, and “DIGIT” are
    > ignored in comparison of the names.
    >
    > The following two character names are exceptions to this rule, because
    > they were created before this rule was specified:
    >
    > 116C HANGUL JUNGSEONG OE
    > 1180 HANGUL JUNGSEONG O-E
    >
    > Examples of unacceptable names that are not unique:
    >
    > SARATI LETTER AA
    > SARATI CHARACTER AA

    I'm wondering if this rule applies to the string "LETTER" in the
    following character names:

    U+210C BLACK-LETTER CAPITAL H
    U+2111 BLACK-LETTER CAPITAL I
    U+211C BLACK-LETTER CAPITAL R
    U+2128 BLACK-LETTER CAPITAL Z
    U+212D BLACK-LETTER CAPITAL C

    In other words, would a hypothetical character name "BLACK CHARACTER
    CAPITAL H" violate this rule?

    (This is not meant as a joke, by the way; I'm playing around with
    algorithms for efficient storage of character names.)

    --
    Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
    http://home.roadrunner.com/~dewell
    http://www1.ietf.org/html.charters/ltru-charter.html
    http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ 
    


    This archive was generated by hypermail 2.1.5 : Sat Dec 01 2007 - 12:21:34 CST