Digits in character names (was: Re: character names (questions))

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Apr 06 2000 - 21:04:31 EDT


Michael Everson responded to John Cowan:

> Ar 13:43 -0800 2000-04-06, scríobh John Cowan:
> >Michael Everson wrote:
> >
> >> Not quite. Digits are disallowed in word-initial position within character
> >> names, and while they ar presently pretty much only used to represent
> >> shapes, there are lots of character names proposed for Plane 1 (Egyptian
> >> Hieroglyphs, Indus, Rongorongo, etc.) where the character name is really
> >> its catalogue number (to be preceded by a letter of some sort; I think I
> >> used H000-H999 for Indus (Harappan), R for Rongorongo, etc).
> >
> >Yes. I meant "Digits are (de jure) discouraged, and are (de facto) only
> >used to represent their shapes."
>
> Well, rather "have hitherto (de facto) only been used to represent their
> shapes".
>
> Except of course in the CJK character names. :-)
>

There are actually 3 current usages of digits in the (English)
normative character names:

1. "9" appears in the names for the directional quote marks, U+201A,
   U+201B, U+201E, U+201F. Those constitute the original exception for
   use of a digit to represent a shape, as in "LOW-9" and "HIGH-9".

   (This was, in my opinion, simply a name hack, and may originally
   have been motivated as a kind of name annotation to prevent
   translators of character standards from translating the "9" into
   something else.)

2. Digits representing the UCS code point occur in both the unified CJK
   characters and in the compatibility CJK characters:

   CJK UNIFIED IDEOGRAPH-4E00, etc., etc.
   CJK COMPATIBILITY IDEOGRAPH-F938, etc., etc.

3. Digits representing the dot positions occur in the Braille pattern
   characters:

   U+28FB BRAILLE PATTERN DOTS-1245678, etc., etc.

And one more usage in the character names accepted in ISO/IEC 10646-2:

4. In the Mathematical Alphanumeric Symbols, for math styled digits:

   U-0001D7CF MATHEMATICAL BOLD DIGIT 1, etc., etc.

The latter three cases have broken the "only to represent their shapes"
exception wide open. And I think it is only a matter of time until
WG2 also accepts the convention of using numeric catalog values for the
character entities of some historic scripts, where those catalog
values would be more useful to the users of the standard than some
arbitrarily concocted alpha-only names that avoid the use of digits
in the name.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT