Re: Invalid code points

From: Hans Aberg (
Date: Mon Jun 01 2009 - 02:21:39 CDT

  • Next message: Hans Aberg: "Re: Invalid code points"

    On 1 Jun 2009, at 00:25, Doug Ewell wrote:

    >> I think also strictly speaking there are two UTF-8s: one which does
    >> not have the integer limitations that are used in Unicode. This
    >> could be used to convert integers sequences into byte sequences
    >> which then do not have Unicode character interpretation.
    > There is only one UTF-8, the one defined by Unicode and ISO/IEC
    > 10646, which maps valid Unicode/10646 scalar values to sequences of
    > bytes. Anything else is not UTF-8. Keep repeating this to yourself.

    I was just reading the successor sequence of RFCs:

    The last one restricts UTF-8 to the Unicode range, the limitations of
    UTF-16, but the others do not.


    This archive was generated by hypermail 2.1.5 : Mon Jun 01 2009 - 02:24:55 CDT