Re: Multibyte languages - Chinese - double bytes or more bytes in each character

From: James Kass (thunder-bird@earthlink.net)
Date: Tue Dec 23 2008 - 14:01:04 CST


Yiru Chen wrote,

>I'm working on a multi-byte language project. I know Chinese is a
>multibyte language, but not sure if that means each Chinese character
>has two bytes or more than one byte but varies (ie. Variable number of
>bytes, can be two, three or more bytes)?

Many scripts (and symbols) in Unicode require more than one
byte per character. How many bytes they require depends
upon the "Unicode transformation format" and their respective
code positions within the standard, explained here:

http://unicode.org/faq/utf_bom.html

(Please see "Q. What are some of the differences between the
UTFs?" for minimal and maximal bytes per character information.)

Here is an article by Mark Davis which explores these aspects:
http://www.icu-project.org/docs/papers/forms_of_unicode/

Best regards,

James Kass



This archive was generated by hypermail 2.1.5 : Fri Jan 02 2009 - 15:33:07 CST