Re: UTF8 vs. Unicode (UTF16) in code

From: Keld Jørn Simonsen (keld@dkuug.dk)
Date: Fri Mar 09 2001 - 05:36:47 EST


On Fri, Mar 09, 2001 at 10:56:30AM -0800, Yves Arrouye wrote:
>
> Since the U in UTF stands for Unicode, UTF-32 cannot represent more than
> what Unicode encodes, which is is 1+ million code points. Otherwise, you're
> talking about UCS-4. But I
> thought that one of the latest revs of ISO 10646 explicitely specified that
> UCS-4 will never encode more than what Unicode can encode, and thus
> definitely these 4 billion characters you're alluding to.

As far as I know the U in UTF stands for Universal - not unicode.
ISO 10646 can encode characters beyond UTF-16, and should retain
this capability. There is a proposal to restrict UTF-8 to
only encompas the same values as UTF-16, but UCS-4 still encodes
the 31-bit code space.

Kind regards
Keld



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT