Re: Invalid code points (was: Re: unicode Digest V10 #106)

From: Doug Ewell (doug@ewellic.org)
Date: Mon Jun 01 2009 - 22:38:26 CDT

Next message: Doug Ewell: "Re: Invalid code points"

Previous message: David J. Perry: "Re: Old Italic in RTL ??"
In reply to: Andrew Lipscomb: "Re: unicode Digest V10 #106"
Next in thread: Dreiheller, Albrecht: "RE: Invalid code points"
Reply: Dreiheller, Albrecht: "RE: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Andrew Lipscomb <ewwa at chattanooga dot net> wrote:

>> In particular, it would be great to know if the range U+0080, ?,
>> U+009F is invalid.
>
> Those code points (encoded properly) are valid. However, their
> appearance may indicate that an error occurred in processing, as the
> C1 controls would be rare in real Unicode text (and, with the
> exception of U+0085, are discouraged in XML). They most often arise by
> treating Windows-1252 as if it were ISO-Latin-1.
>
> In other words, not invalid, but suspicious.

But once again, this is a question of the accuracy or fidelity of the
input data, before it was converted to UTF-8. It has nothing to do with
the validity of the Unicode characters from U+0080 to U+009F, nor of
their UTF-8 representations.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

Next message: Doug Ewell: "Re: Invalid code points"
Previous message: David J. Perry: "Re: Old Italic in RTL ??"
In reply to: Andrew Lipscomb: "Re: unicode Digest V10 #106"
Next in thread: Dreiheller, Albrecht: "RE: Invalid code points"
Reply: Dreiheller, Albrecht: "RE: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jun 01 2009 - 22:42:25 CDT