Date: Mon Jan 22 2007 - 21:35:28 CST

    The Wenlin Institute, see, has a long list
    of Chinese characters, and those associate with CDL.

    This is however just the tip of the iceberg. The definition of a
    charcter in unicode means that for Chinese, it is a little like saying
    every word needs to be agreed upon before encoding it.

    Estimates of the number of Chinese type charcters vary, here are some:-

    Known charcaters:-
    Present unicode 75000
    submited in some form including draft proposals @30000 (my estimate)
    Other characters being researched, suitable for encoding 10000-?? (I
    know of about 10000 , on top of this there is the Chu nom list to
    come, and others)

    Total 115000 plus {115000 is definitely too low}

    Other systems:-
    Tron (japanese) apparently encodes at least 175000 Chinese type characters

    Guess-imates and possibles.

    China has over 200 languages -- if each language uses 5000 unique
    characters total 1 000 000 ( one million!),

    Chinese characters use approximately 200 parts or radicals ,if one
    uses exactly 3 parts to make each character then there are 200x200x200
    = 8 000 000 million possibles.

    The average writeer of Chinese knows about 5000 characters -- if that
    person makes new characters by combining just two together 5000 x 5000
    = 25 000 000 (25 million)

    The Chinese encoding gb18030 has over 50% more possible code-points
    than the present unicode standard (@ one million), maybe they know
    something we don't about how many charcters will need to be encoded.

    John Knightley

    Quoting Ruszlan Gaszanov:

    >> Furthermore, some existing sets of PUA characters, being over 65536 in
    >> number, already cover plane 15 and part of plane 16. Some of us would
    >> be delighted if there was more PUA space.
    > BTW, just curious, where did someone find all those characters to
    > fill up the entire
    > Plane 15 and part of 16? That should be about as many (if not more)
    > as officially
    > encoded in ISO-10646 right now.
    > Ruszlán

