Re: Undefined code positions in 8-bit character sets

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Tue May 06 2008 - 15:36:02 CDT

  • Next message: William J Poser: "Re: query regarding proposed additions to Canadian Aboriginal Syllabics"

    Andreas Prilop wrote on Tuesday, May 06, 2008 5:24 PM

    > The problem was/is:
    > What to do when a byte 0x90 is found in a file that has
    >
    > (a) erroneously charset=ISO-8859-1
    >
    > (b) erroneously charset=Windows-1252
    >
    > (c) no encoding/charset at all specified
    >
    > Surprisingly, the W3C validator gives up with Windows-1252
    > but does perform a check with ISO-8859-1.

    It's not surprising at all. These charsets designations have the *IANA*
    definitions, which are not necessarily identical to international (e.g.
    ISO-8859 series) or national (e.g. TIS-620) standards. Thus 0x90 is
    undefined for Windows-1252 but merely an illegal character for HTML in the
    IANA definition of ISO-88591.

    Richard.



    This archive was generated by hypermail 2.1.5 : Tue May 06 2008 - 15:39:14 CDT