Re: Undefined code positions in 8-bit character sets

From: Andreas Prilop (prilop2008@trashmail.net)
Date: Tue May 06 2008 - 11:24:43 CDT

  • Next message: Michael Everson: "Re: query regarding proposed additions to Canadian Aboriginal Syllabics"

    On Mon, 5 May 2008, David Starner wrote:

    > And, BTW, Andreas, I find links to long, sprawling threads to be less
    > than helpful. It would help if you post links to the direct spots in
    > those threads that bring up this issue, or summarize from them.

    The problem was/is:
    What to do when a byte 0x90 is found in a file that has

    (a) erroneously charset=ISO-8859-1

    (b) erroneously charset=Windows-1252

    (c) no encoding/charset at all specified

    Surprisingly, the W3C validator gives up with Windows-1252
    but does perform a check with ISO-8859-1.

    See the test document
     http://www.unics.uni-hannover.de/nhtcapri/test.htm
    and follow the links "Validate as ISO-8859-1" and
    "Validate as Windows-1252".

    The validation report with Windows-1252 would be more helpful,
    in my opinion, if 0x90 in cp1252 is mapped to something -
    to U+0090 or whatever.



    This archive was generated by hypermail 2.1.5 : Tue May 06 2008 - 11:28:32 CDT