RE: [Very much OT] Frequent incorrect guesses by the charset autodetection in IE7

From: Jony Rosenne (jr@qsm.co.il)
Date: Wed Jul 19 2006 - 11:45:21 CDT

  • Next message: Philippe Verdy: "Re: [Very much OT] Frequent incorrect guesses by the charset autodetection in IE7"

    > -----Original Message-----
    > From: unicode-bounce@unicode.org
    > [mailto:unicode-bounce@unicode.org] On Behalf Of Philippe Verdy
    > Sent: Wednesday, July 19, 2006 4:09 PM
    > To: Samuel Thibault
    > Cc: Cristian Secar?; Sinnathurai Srivas;
    > asmusf@ix.netcom.com; unicode@unicode.org
    > Subject: Re: Frequent incorrect guesses by the charset
    > autodetection in IE7
    >
    >
    > From: "Samuel Thibault" <samuel.thibault@labri.fr>
    > >> Microsoft does support ISO-8859-x in its word processors,
    > using the support from the OS. Also indirectly through the
    > Windows codepages 125x which are extensions of ISO-8859-x
    > where C1 controls have been replaced by other characters (for
    > example Windows 1252 supports ISO-8859-1 except C1 controls,
    > >
    > > Yes, and that's where they begin to do funny things like inserting
    > > CP1252's single quotation mark in a text and then claiming
    > that this is
    > > ISO-8859-1... (just the same for the Euro symbol etc).
    >
    > This is an issue when you edit an existing ISO-8859-1 encoded
    > webpage, and you save it. instead of changing the charset in
    > the HTML code, FrontPage keeps the declaration, but does not
    > encode the Window-1252 extra characters with NCRs as it
    > should. This is an old bug of FrontPage, but I don't think it
    > affects Word, because it creates documents that specifies
    > Windows1252 and not ISO 8859-1.
    >
    > Anyway, Internet Explorer interprets C1 control characters
    > found in a ISO-8859-1 page as if the page was encoded with
    > Windows1252, given that C1 controls have no use in HTML
    > (except the Newline C1 control, not used outside of
    > EBCDIC-encoded texts created on IBM mainframes, such case
    > being so rare on the web that I could not findany occurence
    > of such control in many years.
    >
    > I think that EBCDIC systems are extremely rare on the web, or
    > if they are used, they have the support for more common ISO
    > charsets in the system's I18N libraries so that web documents
    > are created without using EBCDIC, and if texts are extracted
    > from EBCDIC databases, they are first converted to an ISO
    > charset and the EBCDIC newlines are converted to LF or CR+LF
    > without using the C1 control). I wonder if there really
    > remains any EBCDIC system connected to the Net that cannot
    > interpret correctly LF or CR+LF encoded newlines for texts,
    > given that the CR+LF is required for MIME and HTTP-based
    > protocols, and even to communicate with most terminals which
    > no longer use the EBCDIC dynosaur (who uses TN3270 today?
    > Don't everybody use Telnet now, given that old terminals have
    > been thrown and replaced by desktop PCs and workstations?

    I use TN3270 very often, with code page 424 on an EBCDIC mainframe and 1255
    on the PC, and it works quite well.

    I also have developed some PC software that communicates with the mainframe
    over TN3270 using TCP/IP sockets directly, without 3270 emulation software.

    EBCDIC is still common in large organizations requiring high performance and
    reliability and equipped with millions of lines of legacy code, such as
    banks and government.

    Jony

    > Even workstations are disappearing now, except for special
    > cases like engeneering graphics and game consoles, due to
    > multi-tiered or collaborative archite!
    > ctures)
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Jul 19 2006 - 10:53:07 CDT