Re: 8-bit text which is supposed to be UTF-8 but isn't

From: Doug Ewell (dewell@compuserve.com)
Date: Mon Jan 31 2000 - 09:38:30 EST


Whoops! I made a significant mistake. I wrote:

>> ISO 10646 is 31 bits. All possible values should be allowed.
>> I do not know why Unicode have decided to grow their bits to
>> more than 16 bits, but not to all 31 bits of ISO 10646.
>> But that is no reason to not allow full 31 bits in UTF-8 encoded
>> text.
>
> There IS a reason: to allow all of Unicode to be expressed in UTF-8.

which may have been what caused Dan to reply:

> Yes, UTF-16 was done right. Unfortunately UTF-8 was done wrongly.
> UTF-8 should just like UTF-16 is compatible with code in the 16-bit
> space, been compatible with the first characters of 8 bits.

Of course, I should have said "to allow all of Unicode to be expressed
in UTF-16." UTF-8, at least in its original RFC 2279 incarnation, does
indeed allow the encoding of 31-bit ISO 10646. My bad.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT