From: Jon Babcock (email@example.com)
Date: Thu Mar 06 2003 - 11:05:13 EST
Yung-Fong Tang wrote:
> I remember there were some study to show although UTF-8 encode each
> Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use
> LESS characters in writting to communicate information than alphabetic
> base langauges.
For my commercial Japanese-to-English translation work, I
estimate from 2.3 to 3.2 Japanese characters for one word of
English, estimated at 6 characters. It varies depending on the
kanji to kana ratio in the source text.
For commercial contemporary Chinese-to-English translation, I
estimate 1.4 to 1.8 Chinese characters per English word,
estimated at 6 characters. (I just asked about this on a mailing
list devoted to C-E/E-C translation and the one translator who
responded said he uses 1.62 Chinese characters per English word
which agrees with my experience.)
Since a "word" is probably about the smallest chunk of meaning
you're going find, this would suggest that where it takes 6
bytes to encode a word of English at one-byte per character, at
3 bytes per character, it will take from about 4.3 to 3.3 bytes
to encode a word of Chinese, I guess.
The above applies to contemporary (modern) traditional Chinese.
I don't know if there is a practical difference in efficiency
between traditonal and simplified. But from my experience with
classical Chinese, I would guess that most classical Chinese is
at least twice as efficient as modern Chinese. (This, plus its
freedom from any tight dependence on sound, facilitated its
great success as the language of culture throughout the
traditional kanji culture realm --- China, Korea, Japan,
Vietnam, etc., imo.)
-- Jon Babcock <firstname.lastname@example.org>
This archive was generated by hypermail 2.1.5 : Thu Mar 06 2003 - 11:56:45 EST