Re: 2 Tibetan characters and one alias do not obey names rules

From: Andrew West (
Date: Fri Mar 19 2010 - 15:40:35 CST

  • Next message: Benjamin Rossen: "super- and subscripted characters"

    On 19 March 2010 17:41, Asmus Freytag <> wrote:
    >> 24.2 Name formation
    >> An entity names shall consist only of the following characters
    >> • DIGIT ZERO through DIGIT NINE,
    >> • SPACE,
    >> • HYPHEN-MINUS, and
    >> • FULL STOP if the entity being named is a collection
    >> The first character in an entity name shall be a Latin capital letter.
    > The actual rules for *character* names have always been that the
    > first character in a *word* (i.e. following a space, or start of name)
    > must be a Latin capital letter, except that hyphen-minus may start any
    > word but the first.

    ISO/IEC 10646: 2003 Annex L Rule 1 states that "Names of characters
    may also include digits 0 to 9 (provided that a digit is not the first
    character in a word)" which agrees with what you say.

    > I've not been aware that this was changed deliberately, so to me, the
    > above statement of the rules seem to contain an editing mistake resulting
    > from their recent reformulation.

    I think you are right. There seems to be an omission here, with the
    result that these rules do not concur with R2 in the TUS text.

    R2 Digits do not occur as the first character of a character name, nor
    immediately following a space character.

    Luckily there is still time time to fix that in the FCD ballot.

    >> The last character in an entity name shall be either a Latin capital
    >> letter or a Digit.
    > This seems to needlessly rule out a hypothetical
    > While this may not occur as a part of Tibetan characters
    > as far as they have been encompassed, it looks like an
    > unnecessary restriction in the face of future naming
    > requirements for this and other scripts.

    A character name such as TIBETAN LETTER A- is theoretically possible,
    and prohibiting final hyphen-minus may be unnecessarily restrictive if
    someone wants to propose a character naming convention for some as yet
    unencoded script that uses hyphen-minus to represent an orthographic
    glottal stop (this is something that has been discussed recently for
    one particular script, although in that case glottal stop only occurs
    initially). On the other hand, if this restriction has been around for
    a long time, and implementations expect only A-Z/0-9 at the end of a
    character name, then it is probably best to live with this
    restriction. I personally don't think it is necessary to change this
    rule on the off chance that final hyphen-minus may be useful at some
    future date.


    This archive was generated by hypermail 2.1.5 : Fri Mar 19 2010 - 15:44:58 CST