RE: UTF8 vs. Unicode (UTF16) in code

From: Peter_Constable@sil.org
Date: Fri Mar 09 2001 - 14:41:37 EST


On 03/09/2001 12:53:57 PM "Ayers, Mike" wrote:

> Um... no. The UTF-32 CES can handle much more than the current
>space of the Unicode CCS. As far as I can tell, it's good to go until we
>need more than 32 bits to represent the ACR. I'm actually surprised that
>this comment was so misunderstood. Ah, well...

Strictly speaking, I'm afraid you're wrong. The UTF-32 encoding form is
defined in UTR#19 which clearly states

<quote>
     UTF-32 is restricted in values to the range 0..10FFFF(subscript: 16)
</quote>

Unsigned 32-bit integers can directly represent 4G characters; UTF-32 can
accommodate much much less.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT