Re: Undefined code positions in 8-bit character sets

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Tue May 06 2008 - 15:36:02 CDT

Next message: William J Poser: "Re: query regarding proposed additions to Canadian Aboriginal Syllabics"

Previous message: Mark Davis: "Re: (OT) The Importance of Getting the Casing Right (was: Freedom to Normalise)"
In reply to: Andreas Prilop: "Re: Undefined code positions in 8-bit character sets"
Next in thread: Philippe Verdy: "RE: Undefined code positions in 8-bit character sets"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Andreas Prilop wrote on Tuesday, May 06, 2008 5:24 PM

> The problem was/is:
> What to do when a byte 0x90 is found in a file that has
>
> (a) erroneously charset=ISO-8859-1
>
> (b) erroneously charset=Windows-1252
>
> (c) no encoding/charset at all specified
>
> Surprisingly, the W3C validator gives up with Windows-1252
> but does perform a check with ISO-8859-1.

It's not surprising at all. These charsets designations have the *IANA*
definitions, which are not necessarily identical to international (e.g.
ISO-8859 series) or national (e.g. TIS-620) standards. Thus 0x90 is
undefined for Windows-1252 but merely an illegal character for HTML in the
IANA definition of ISO-88591.

Richard.

Next message: William J Poser: "Re: query regarding proposed additions to Canadian Aboriginal Syllabics"
Previous message: Mark Davis: "Re: (OT) The Importance of Getting the Casing Right (was: Freedom to Normalise)"
In reply to: Andreas Prilop: "Re: Undefined code positions in 8-bit character sets"
Next in thread: Philippe Verdy: "RE: Undefined code positions in 8-bit character sets"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue May 06 2008 - 15:39:14 CDT