Re: length of text by different languages

From: Yung-Fong Tang (ftang@netscape.com)
Date: Thu Mar 06 2003 - 17:33:09 EST

Next message: Yung-Fong Tang: "Re: length of text by different languages"

Previous message: Yung-Fong Tang: "Re: length of text by different languages"
In reply to: Francois Yergeau: "RE: length of text by different languages"
Next in thread: Ram Viswanadha: "Re: length of text by different languages"
Reply: Ram Viswanadha: "Re: length of text by different languages"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Francois Yergeau wrote:

>ftang@netscape.com wrote:
>
>
>>I remember there were some study to show although UTF-8 encode each
>>Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use
>>LESS characters in writting to communicate information than
>>alphabetic base langauges.
>>
>>Any one can point to me such research?
>>
>>
>
>I don't know of exactly what you want, but I vaguely remember a paper given
>at a Unicode conference long ago that compared various translations of the
>charter (or some such) of the Voice of America in a couple or three
>encodings. Hmmmm, let's see.... could be this:
>
>http://www.unicode.org/iuc/iuc9/Friday2.html#b3
>Reuters Compression Scheme for Unicode (RCSU)
>Misha Wolf
>
yea. That could be it. I got a hard copy and it looks like the Fig 2 is
the one I am looking for.

>
>No paper online, alas. I remember that Chinese was a clear winner in terms
>of # of characters. In fact, I kind of remember that Chinese was so much
>denser that it still won after RCSU (now SCSU) compression, which would mean
>that a Han character contains more than twice as much info on average as a
>Latin letter as used in (say) English.
>
>This is all on pretty shaky ground, distant memories. Perhaps Misha stil
>has the figures (if that's in fact the right paper).
>
>
>

Next message: Yung-Fong Tang: "Re: length of text by different languages"
Previous message: Yung-Fong Tang: "Re: length of text by different languages"
In reply to: Francois Yergeau: "RE: length of text by different languages"
Next in thread: Ram Viswanadha: "Re: length of text by different languages"
Reply: Ram Viswanadha: "Re: length of text by different languages"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Mar 06 2003 - 18:33:15 EST