Re: Why does tr36 say to not have > 4 byte utf8 nor go outside 10FFFF ?

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Thu Dec 24 2009 - 11:12:59 CST

Next message: John W Kennedy: "Re: HTML5 encodings (was: Re: BOCU patent)"

Previous message: karl williamson: "Why does tr36 say to not have > 4 byte utf8 nor go outside 10FFFF ?"
In reply to: karl williamson: "Why does tr36 say to not have > 4 byte utf8 nor go outside 10FFFF ?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 12/24/2009 8:24 AM, karl williamson wrote:
> For its other recommendations, it gives scenarios for why, but I
> didn't see any for these. Could someone please explain it for me?
>
> Thanks
>
>
Using sequences of more than 4 UTF-8 bytes or code units > 10FFFF in
UTF-32 makes the data not representable in UTF-16, breaking the full
interconvertibility of UTFs.

In open interchange, any UTF-16-based implementation would be unable to
represent these values, leading possibly to undefined behavior.

In other words, while these might seem to be algorithmical or numerical
options, they are not valid Unicode code points, and thus have no
business as part of Unicode data streams.

A./

Next message: John W Kennedy: "Re: HTML5 encodings (was: Re: BOCU patent)"
Previous message: karl williamson: "Why does tr36 say to not have > 4 byte utf8 nor go outside 10FFFF ?"
In reply to: karl williamson: "Why does tr36 say to not have > 4 byte utf8 nor go outside 10FFFF ?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Dec 24 2009 - 11:17:20 CST