AddDefaultCharset considered harmful (was: Mojibake on my Web pages)

From: Martin Duerst (duerst@w3.org)
Date: Thu Sep 25 2003 - 16:32:27 EDT

  • Next message: Markus Scherer: "Re: Unicode Normalisaton Optimisation Experiments"

    Hello Doug, others,

    Here is my most probable explanation:
    Adelphia recently upgraded to Apache 2.0. The core config file (httpd.conf)
    as distributed contains an entry
         AddDefaultCharset iso-8859-1
    which does what you have described. They probably adopted this
    because the comment in the config file suggests that it's important.

    I have just filed a bug with bugzilla, asking that this default
    setting be removed or commented out, and the comment fixed, at
    http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23421. You may
    want to vote for that bug.

    I have also commented on a related bug that I found, at
    http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14513.

    I suggest you tell your Internet provider:
    1) that they change to AddDefaultCharset Off
        (or simply comment this out)
    2) that they make sure you get FileInfo permission in your directories,
        so that you can do the settings you know you are correct.

    The comment in the config file contains mostly very strange statements:

    >>>>>>>>
    #
    # Specify a default charset for all pages sent out. This is
    # always a good idea and opens the door for future internationalisation
    # of your web site, should you ever want it. Specifying it as
    # a default does little harm; as the standard dictates that a page
    # is in iso-8859-1 (latin1) unless specified otherwise i.e. you
    # are merely stating the obvious. There are also some security
    # reasons in browsers, related to javascript and URL parsing
    # which encourage you to always set a default char set.
    #
    AddDefaultCharset ISO-8859-1
    >>>>>>>

    If anybody knows something about these security issues, please
    tell me (any mention of security issues usually has webmasters
    in control, for good reasons).

    Regards, Martin.

    At 22:40 03/09/22 -0700, Doug Ewell wrote:
    >Apologies in advance to anyone who visits my Web site and sees garbage
    >characters, a.k.a. "mojibake." It isn't my fault.
    >
    >Adelphia is currently having a character-set problem with their HTTP
    >servers. Apparently they are serving all pages as ISO 8859-1 even if
    >they are marked as being encoded in another character set, such as
    >UTF-8.

    >If you manually change the encoding in your browser to UTF-8, or
    >download the page and display it as a local file, everything looks fine
    >because Adelphia's server is no longer calling the shot. Their tech
    >support people acknowledge that the problem is at their end and said
    >they would look into it.
    >
    >I understand that having the "Unicode Encoded" logo on my page next to
    >these garbage characters may not reflect well on Unicode, especially to
    >newbies. I'm considering putting a disclaimer at the top of my pages,
    >but I'm waiting to see how quickly they solve the problem.
    >
    >-Doug Ewell
    > Fullerton, California
    > http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Thu Sep 25 2003 - 17:41:01 EDT