From: Stefan Persson (email@example.com)
Date: Fri May 07 2004 - 13:22:37 CDT
Clark Cox wrote:
>> also that
>> UTF-8 encoded sequences can be up to 5 bytes long...
> How is that possible. I was under the impression that a UTF-8
> sequence could never be more than 4 bytes (i.e. U+10FFFF becomes F4 8F
> BF BF).
Unicode & ISO/IEC 10646 define UTF-8 differently; Unicode stops at 4
bytes, while ISO/IEC 10646 allows more bytes; however, all combinations
with more bytes than 4 result in illegal sequences or illegal code points.
This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:26 CDT