Date: Thu Mar 09 2000 - 07:00:31 EST

Otto Stolz wrote:
> - Is there any difference (and, if so, what) between the following
> Hangzhou-style numerals and "ordinary" CJK characters?
> U+3021 vs. U+4E28
> U+3024 vs. U+4E42
> U+3026 vs. U+4EA0
> U+3029 vs. U+5973

The two groups are totally unrelated: the "Hang"zhou characters are used as
numerals by shopkeepers in some places of China; the right-hand characters
are ideographs, and their similarity is only accidental.

These coincidences may happen with any pair of unrelated script, especially
with such simple shapes. E.g.: U+3021 and U+4E28 also look like letter "I"
(Latin, Greek, Cyrillic), letter alif (Arabic), digit "1" (Roman, Arabic,
European), and many other vertical-bar-shaped characters in many scripts...

The first three ideographs, as far as I know, have a mostly
"meta-linguistic" usage: i.e. they are *components* of ideographs, used to
write *about* the shape of ideographs (compare the index section on any CJK
dictionary to see them used).

The last ideograph means "woman", "female". Notice that it is actually quite
different from U+3029: in U+5973 the horizontal bar extends to the right
part, while in U+3029 it doesn't.

> - Why are there no cross references in the Hangzhou-style
> Numerals block
> - Would an explanation of these numerals be appropriate on p. 6-94?

I don't have my 2.0 book at hand. But what would you cross-reference
"Hang"zhou numerals with?

> - Are there any more almost-homographs I haven't found?

Plenty! For example, have a look at blocks U+2E80..2EF3 (CJK Radicals),
U+2F00..U+2FD5 (Kangxi Radicals), U+F900..U+FA2D (CJK Compatibility
Ideographs): most characters in these blocks are identical to ideographs in
the regular U+4E00..U+9FA5 (CJK Unified Ideographs).

I have learned from this list that there are various reasons for having
clones are various, the main one being "source separation": to allow
round-trip conversions, if they had separate code points in the standards
that Unicode was bases on, they kept separate code points in Unicode.

_ Marco

