Re: Invalid code points (was: Re: unicode Digest V10 #106)

From: Doug Ewell (
Date: Mon Jun 01 2009 - 22:38:26 CDT

  • Next message: Doug Ewell: "Re: Invalid code points"

    Andrew Lipscomb <ewwa at chattanooga dot net> wrote:

    >> In particular, it would be great to know if the range U+0080, ?,
    >> U+009F is invalid.
    > Those code points (encoded properly) are valid. However, their
    > appearance may indicate that an error occurred in processing, as the
    > C1 controls would be rare in real Unicode text (and, with the
    > exception of U+0085, are discouraged in XML). They most often arise by
    > treating Windows-1252 as if it were ISO-Latin-1.
    > In other words, not invalid, but suspicious.

    But once again, this is a question of the accuracy or fidelity of the
    input data, before it was converted to UTF-8. It has nothing to do with
    the validity of the Unicode characters from U+0080 to U+009F, nor of
    their UTF-8 representations.

    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14  ˆ

    This archive was generated by hypermail 2.1.5 : Mon Jun 01 2009 - 22:42:25 CDT