From: Kenneth Whistler (firstname.lastname@example.org)
Date: Tue Dec 04 2007 - 13:31:54 CST
> > I'm wondering if this rule applies to the string "LETTER" in the
> > following character names:
> > U+210C BLACK-LETTER CAPITAL H
> > U+2111 BLACK-LETTER CAPITAL I
> > U+211C BLACK-LETTER CAPITAL R
> > U+2128 BLACK-LETTER CAPITAL Z
> > U+212D BLACK-LETTER CAPITAL C
> it most certainly does.
> > In other words, would a hypothetical character name "BLACK CHARACTER
> > CAPITAL H" violate this rule?
> > (This is not meant as a joke, by the way; I'm playing around with
> > algorithms for efficient storage of character names.)
> Believe it or not, the Consortium uses software to make sure that these
> rules are followed.
> PS: I know, I wrote one of the tools used in checking drafts of the
> nameslist during my tenure as code chart editor.
In addition to the check that Asmus wrote into the tools for
checking drafts of the names list, I have an independently
written tool that also checks for code point and name
duplications, including the loose name matching rules, as well as
out-and-out duplications (which might happen in name list
preparation, since a lot of copy/paste is involved in
editing initial name lists for proposals, typically).
Note that the scope for name duplication is the union
of the *character* names in UnicodeData.txt (and the
generated, but non-problematical unified ideograph
names and Hangul syllable names) and the *named sequences*
in NamedSequences.txt. So the checking has to be done
for both of those together, and not just in UnicodeData.txt.
Run against the current UnicodeData-5.1.0.txt and
NamedSequences-5.1.0.txt, that tool correctly detects
the one grandfathered exception to the loose name matching
116C HANGUL JUNGSEONG OE
1180 HANGUL JUNGSEONG O-E
I also run the tool against an artificially hacked up
UnicodeData.txt with various bogus additional characters
added, with names like SUBSCRIPT DIGIT THREE and
SUBSCRIPTDIGIT THREE (as opposed to the actually encoded
SUBSCRIPT THREE), to verify that the tool would actually
detect other possible classes of duplications.
I just added "BLACK CHARACTER CAPITAL H" to the test file,
and it popped right out as a violation of the name
This archive was generated by hypermail 2.1.5 : Tue Dec 04 2007 - 13:35:34 CST