RE: UTF8 vs. Unicode (UTF16) in code

From: Yves Arrouye (yves@realnames.com)
Date: Fri Mar 09 2001 - 14:20:06 EST


> > On 03/08/2001 07:40:25 PM "Ayers, Mike" wrote:
> >
> > > If you really want to finish the job, there's always
> > UTF-32, which
> > >should do rather nicely until we meet the space aliens aith the
> > >4,293,853,186 character alphabet!
> >
> > Um... no. The 1,113,023 character alphabet (one more than the
> > encodable
> > scalar values in the codespace supported by UTF-8 / 16 / 32).
> >
>
> Um... no. The UTF-32 CES can handle much more than the current
> space of the Unicode CCS. As far as I can tell, it's good to
> go until we
> need more than 32 bits to represent the ACR. I'm actually
> surprised that
> this comment was so misunderstood. Ah, well...

Since the U in UTF stands for Unicode, UTF-32 cannot represent more than
what Unicode encodes, which is is 1+ million code points. Otherwise, you're
talking about UCS-4. But I
thought that one of the latest revs of ISO 10646 explicitely specified that
UCS-4 will never encode more than what Unicode can encode, and thus
definitely these 4 billion characters you're alluding to.

YA



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT