Re: U+xxxx, U-xxxxxx, and the basics

From: Dan Oscarsson (Dan.Oscarsson@trab.se)
Date: Mon Mar 06 2000 - 02:58:24 EST


>Actually, surrogate pairs are better designed than that. In UTF-16, no
>16-bit value is "overloaded" or used for anything other than itself. That
>is, 0x212B is always ANGSTROM and never anything but ANGSTROM. It is *never*
>half of a surrogate pair (in the same way that no byte in a UTF-8 character
>is in-and-of-itself a character in UTF-8).

But UTF-8 is not as good designed as UTF-16. UTF-16 does not "overload"
any value used in Unicode (UCS-2) (i.e. 16-bit representation).
Unfortunately UTF-8 "overloads" values used in what would be UCS-1
(codes 0-255) (i.e. 8-bit representation) making a conflict with
those mostly needing codes 0-255.

   Dan



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT