RE: UTF8 vs. Unicode (UTF16) in code

From: Ayers, Mike (Mike_Ayers@bmc.com)
Date: Fri Mar 09 2001 - 15:24:21 EST


> From: Peter_Constable@sil.org [mailto:Peter_Constable@sil.org]
>
> On 03/09/2001 12:53:57 PM "Ayers, Mike" wrote:
>
> > Um... no. The UTF-32 CES can handle much more than the current
> >space of the Unicode CCS. As far as I can tell, it's good
> to go until we
> >need more than 32 bits to represent the ACR. I'm actually
> surprised that
> >this comment was so misunderstood. Ah, well...

        Of course, looking back at the job I did of explaining it here, it's
no wonder there has been confusion. The original subject was the support of
Unicode in computer code. By using UTF-32, or more accurately, the 32 bit
datatype needed to represent it, the code would be future-proofed against
expansions to the Unicode character set (which I believe would continue to
be represented in the corresponding updated UTF-32 spec as a flat mapping),
which has been generally acknowledged will happen only when we meet space
aliens with a really large alphabet. There is a lot of code floating about
right now that needs maintenance because it was not future-proofed only a
few years ago. My comment about the "4,293,853,186 character alphabet" was
a simple reminder that there are never any guarantees - that even
future-proofed software can find a future for which it is unprepared, but
that the risk of such hapenning can be made ridiculously small.

        Do I make sense yet?

/|/|ike



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT