Re: Invalid code points

From: Doug Ewell (doug@ewellic.org)
Date: Sun May 31 2009 - 19:23:39 CDT

  • Next message: William J Poser: "Re: Invalid code points"

    Ruszlán Gaszanov <ruszlan at ather dot net> wrote:

    > Well, even though C1 control codes are technically valid Unicode
    > characters, in practice, Unicode or ISO-8859-x streams containing
    > those code points are extremely rare come by. For most practical
    > purposes, the presence of those bytes in a text stream would likely
    > suggest Windows 12xx codepage.

    Absolutely correct. If you see 0x80 in an "ISO 8859-1" text stream,
    it's very likely that the stream should have been interpreted as
    Windows-1252 instead.

    But if you see {0xC2, 0x80} in a UTF-8 text stream, it's a perfectly
    valid encoding of U+0080, regardless of whether U+0080 was the right
    code point to begin with.

    --
    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
    http://www.ewellic.org
    http://www1.ietf.org/html.charters/ltru-charter.html
    http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ
    


    This archive was generated by hypermail 2.1.5 : Sun May 31 2009 - 19:26:58 CDT