Re: Hanzi trad-simp folding and z-variants

From: john knightley <>
Date: Sun, 9 Jun 2013 23:03:41 +0800

On Sun, Jun 9, 2013 at 7:29 PM, Stephan Stiller

> The way the Cheung-Bauer list was compiled certainly hard to see how most
>> of the characters would be in widely known.
> I'd need to look at C&B again for accurate numbers, but to some extent
> it's simply because some syllable-morphemes are listed with many different
> attested possibilities. So one really wouldn't expect to need all ≈1000
> characters in there.
> There is a tricky aspect to this, though: the left-addition of "o" (or a
> mouth radical) leaves the exact number a bit open and allows for a larger
> count. Do you write some Cantonese-only syllable-morpheme as "X" or
> "⿰口X"/"oX"? (Most of the latter combinations are in fact in C&B, but,
> anyways, it's hard to give a precise answer to the "how many Cantonese
> characters" question.) Here is an example: 嚿 vs 舊 for the measure word gau6
> ("lump"). Depending on whom you ask, you might even find a strong opinion.
> Most people will probably say that "嚿 is better", but the fact that you
> find 舊 (because it's more straightforward to type) means that in a way it's
> descriptively correct. There are cases where the variant without a mouth
> would be regarded as more common or natural, because the version with a
> mouth radical is typographically rare.
> With Zhuang Sawndip I have examining texts from different locations and
>> eras, that there exists both evidence of transmission from generation to
>> generation, of progression and also of unstability.
> Just curious: what is a rough character count?
   There are a number of dialects which pushes the numbers up a little. The
only published dictionary has just over ten thousand characters of which
just over half are not in Unicode yet. Count of Sawndip have from different
texts and research published in China is currently around twenty thousand
with ten thousand not in Unicode.

    However those currently published material only represent a fraction of
the whole. My best estimate that the total number of Sawndip currently in
circulation is 50 to 100 thousand of which 20 to 30 thousand are presently
in Unicode.


> Stephan
Received on Sun Jun 09 2013 - 10:08:32 CDT

This archive was generated by hypermail 2.2.0 : Sun Jun 09 2013 - 10:08:33 CDT