Re: Why does tr36 say to not have > 4 byte utf8 nor go outside 10FFFF ?

From: Asmus Freytag (
Date: Thu Dec 24 2009 - 11:12:59 CST

  • Next message: John W Kennedy: "Re: HTML5 encodings (was: Re: BOCU patent)"

    On 12/24/2009 8:24 AM, karl williamson wrote:
    > For its other recommendations, it gives scenarios for why, but I
    > didn't see any for these. Could someone please explain it for me?
    > Thanks
    Using sequences of more than 4 UTF-8 bytes or code units > 10FFFF in
    UTF-32 makes the data not representable in UTF-16, breaking the full
    interconvertibility of UTFs.

    In open interchange, any UTF-16-based implementation would be unable to
    represent these values, leading possibly to undefined behavior.

    In other words, while these might seem to be algorithmical or numerical
    options, they are not valid Unicode code points, and thus have no
    business as part of Unicode data streams.


    This archive was generated by hypermail 2.1.5 : Thu Dec 24 2009 - 11:17:20 CST