RE: FYI: Google blog on Unicode

From: verdy_p (
Date: Mon Feb 08 2010 - 04:17:09 CST

  • Next message: Andreas Prilop: "Re: FYI: Google blog on Unicode"

    What is the default encoding of Apache itself ?
    Except for the messages that the server will generate itself for the system administrator (for example in system
    logs), Apache has no defaults and will just publish the pages in the encoding that they were encoded by the web
    designer. If the web designer specifies no default in its pages (or in the HTTP settings files added in the contents
    repository, Apache will not generate any MIME header specifying the charset in HTTP replies.

    If there are apparent defaults, they will most often come from the server plugins installed on top of Apache, or
    from the underlying OS (for example in the way it encodes the local file names, if that OS specifies a codepage in
    its APIs).

    So browsers will be exposed with the encodings and they will have to "guess" or to use the users settings.

    There's nothing to change in Apache, it's up to web designers to be careful about the design of their pages and up
    to web administrators to correctly set the Apache encironment, if they want to enforce a default, and up to server
    side script authors to provide a framework for their CMS that will allow web designers or authors to publish their
    work or data correctly with a predictable encoding, and to provide a test framework to make sure that no necoding
    errors will happen.

    I still see lots of web pages that are mixing several distinct encodings on the same page, for example a form
    generated by a plugin or standard script in UTF-8, and a second form generated by another plugin or stadnard script
    in ISO 8859-1, plus the encoding of the page headers/footers/menus as they were created by the web deisgner. It's of
    course impossible to guess or set any encoding correctly that will match all the content displayed in the SAME html
    document (i.e. not in a separate frame) and users will see the U+FFFD replacement symbol in browsers.

    Such errors occur when a web site has decided to change the default encoding of the general framework, but the data
    coming from a CMS repository (or from a database) has not been migrated to the new encoding (UTF-8 most often), and
    nothing has been made to mark the old data (ISO 8859-* most often) so that the framework will transparently
    transcode it to UTF-8 on the fly.

    So you can't change the defaults in an existing system without forcing some developments and tests of the
    integration of the various components that are assemblied to make the existing system.

    > Message du 29/01/10 18:55
    > De : "Jonathan Rosenne"
    > A : "'Unicode Mailing List'"
    > Copie à :
    > Objet : RE: FYI: Google blog on Unicode
    > Don't be so haughty - nobody changes defaults without good reason and understanding what it means.
    > Jony
    > > -----Original Message-----
    > > From: [] On
    > > Behalf Of Ed Trager
    > > Sent: Friday, January 29, 2010 4:29 PM
    > > To: Unicode Mailing List
    > > Subject: Re: FYI: Google blog on Unicode
    > >
    > > On Thu, Jan 28, 2010 at 11:30 PM, Curtis Clark
    > > wrote:
    > > > On 2010-01-28 11:16, Ed Trager wrote:
    > > >>
    > > >> Now I just wish that the Apache people would make UTF-8 the
    > > *default*,
    > > >> *out-of-the-box* encoding for the Apache web server.
    > > >
    > > > Hear, hear! In my experience, character-code-clueless sysadmins never
    > > like
    > > > to change the defaults.
    > >
    > > Yes - especially the sysadmins at clueless ISPs.
    > >
    > >
    > > >
    > > > --
    > > > Curtis Clark
    > > > Director, I&IT Web Development +1 909 979 6371
    > > > University Web Coordinator, Cal Poly Pomona
    > > >
    > > >
    > >

    This archive was generated by hypermail 2.1.5 : Mon Feb 08 2010 - 04:19:52 CST