From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Dec 24 2003 - 19:18:15 EST
"Michael Everson" <everson@evertype.com>
> We have encoded 70,000 of them.
All depends on the way you define characters. Most ideographs are composed,
but Unicode and the CJK unification working groups have failed for now to
define a coherent definition of how these characters really compose, so we
are still assisting to an always exploding number of compound ideographs,
created everyday by Han users.
If Latin characters were counted the way Han is, we would probably reach
similar (may be even more) composed "characters". It's just infortunate that
Han lacks a way to describe its composition model (it used to be the case
too for the Hangul Alphabet, but recent works seem to demonstrate that the
complexity of Hangul is just superficial in Unicode but forgets the actual
use and rules that are inherent to the script script).
I'm sad to say that I really think that the Unicode character model is very
weak except for LTR alphabet scripts like Latin, Greek and Cyrillic... And
this also affects then the W3C character model as well. New concepts are
needed to correctly handle the actual properties of languages used by
billions of people that are not used to the English language, and the
Unicode formalism and work methods.
This archive was generated by hypermail 2.1.5 : Wed Dec 24 2003 - 20:00:24 EST