Re: any unicode conversion tools?

From: Stefan Persson (alsjebegrijptwatikbedoel@yahoo.se)
Date: Fri May 07 2004 - 13:22:37 CDT


Clark Cox wrote:
>> Note
>> also that
>> UTF-8 encoded sequences can be up to 5 bytes long...
>
> How is that possible. I was under the impression that a UTF-8
> sequence could never be more than 4 bytes (i.e. U+10FFFF becomes F4 8F
> BF BF).

Unicode & ISO/IEC 10646 define UTF-8 differently; Unicode stops at 4
bytes, while ISO/IEC 10646 allows more bytes; however, all combinations
with more bytes than 4 result in illegal sequences or illegal code points.

Stefan



This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:26 CDT