RE: UTF8 vs. Unicode (UTF16) in code

From: Marco Cimarosti (
Date: Fri Mar 09 2001 - 04:42:07 EST

Addison P. Phillips wrote:
> [...]
> currently there are no characters "up there" this isn't a really big
> deal. Shortly, when Unicode 3.1 is official, there will be 40K or so
> characters in the supplemental planes... but they'll be
> relatively rare.

This reminds me of a question that I wanted to ask since a lot time: how
rare is the most common of characters in the extended planes? Hmmmm... Maybe
I should be clearer.

Does it exist at least one character > U+FFFF that is commonly used in at
least one modern language?

I am wondering especially about the CJK characters in Extension B. We all
know that the majority of them are rare, ancient or idiosyncratic
characters, but I am not quite sure that this is true for *all* of them.

I think that this is an important question for deciding whether an
application should use 32 or 16 bit characters internally, and whether an
application has to be fully UTF-16 aware or it can be "UTF-16 ignorant".

E.g., imagine designing an application that will be localized in Cantonese:
it is important to know whether all characters needed in Cantonese are in
the BMP, or if some of them are in Extension B.

_ Marco

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT