Re: Invalid code points

From: Hans Aberg (haberg@math.su.se)
Date: Mon Jun 01 2009 - 02:21:39 CDT

  • Next message: Hans Aberg: "Re: Invalid code points"

    On 1 Jun 2009, at 00:25, Doug Ewell wrote:

    >> I think also strictly speaking there are two UTF-8s: one which does
    >> not have the integer limitations that are used in Unicode. This
    >> could be used to convert integers sequences into byte sequences
    >> which then do not have Unicode character interpretation.
    >
    > There is only one UTF-8, the one defined by Unicode and ISO/IEC
    > 10646, which maps valid Unicode/10646 scalar values to sequences of
    > bytes. Anything else is not UTF-8. Keep repeating this to yourself.

    I was just reading the successor sequence of RFCs:
       http://tools.ietf.org/html/rfc2044
       http://tools.ietf.org/html/rfc2279
       http://tools.ietf.org/html/rfc3629

    The last one restricts UTF-8 to the Unicode range, the limitations of
    UTF-16, but the others do not.

       Hans



    This archive was generated by hypermail 2.1.5 : Mon Jun 01 2009 - 02:24:55 CDT