Re: Frequent incorrect guesses by the charset autodetection in IE7

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Jul 19 2006 - 09:09:09 CDT

  • Next message: Jony Rosenne: "RE: [Very much OT] Frequent incorrect guesses by the charset autodetection in IE7"

    From: "Samuel Thibault" <samuel.thibault@labri.fr>
    >> Microsoft does support ISO-8859-x in its word processors, using the support from the OS. Also indirectly through the Windows codepages 125x which are extensions of ISO-8859-x where C1 controls have been replaced by other characters (for example Windows 1252 supports ISO-8859-1 except C1 controls,
    >
    > Yes, and that's where they begin to do funny things like inserting
    > CP1252's single quotation mark in a text and then claiming that this is
    > ISO-8859-1... (just the same for the Euro symbol etc).

    This is an issue when you edit an existing ISO-8859-1 encoded webpage, and you save it. instead of changing the charset in the HTML code, FrontPage keeps the declaration, but does not encode the Window-1252 extra characters with NCRs as it should. This is an old bug of FrontPage, but I don't think it affects Word, because it creates documents that specifies Windows1252 and not ISO 8859-1.

    Anyway, Internet Explorer interprets C1 control characters found in a ISO-8859-1 page as if the page was encoded with Windows1252, given that C1 controls have no use in HTML (except the Newline C1 control, not used outside of EBCDIC-encoded texts created on IBM mainframes, such case being so rare on the web that I could not findany occurence of such control in many years.

    I think that EBCDIC systems are extremely rare on the web, or if they are used, they have the support for more common ISO charsets in the system's I18N libraries so that web documents are created without using EBCDIC, and if texts are extracted from EBCDIC databases, they are first converted to an ISO charset and the EBCDIC newlines are converted to LF or CR+LF without using the C1 control). I wonder if there really remains any EBCDIC system connected to the Net that cannot interpret correctly LF or CR+LF encoded newlines for texts, given that the CR+LF is required for MIME and HTTP-based protocols, and even to communicate with most terminals which no longer use the EBCDIC dynosaur (who uses TN3270 today? Don't everybody use Telnet now, given that old terminals have been thrown and replaced by desktop PCs and workstations? Even workstations are disappearing now, except for special cases like engeneering graphics and game consoles, due to multi-tiered or collaborative architectures)



    This archive was generated by hypermail 2.1.5 : Wed Jul 19 2006 - 09:16:14 CDT