RE: Bad Content-type headers on Unicode web site?

From: Addison Phillips (addison.phillips@quest.com)
Date: Fri Mar 04 2005 - 12:23:41 CST

  • Next message: Erik van der Poel: "Re: Bad Content-type headers on Unicode web site?"

    If the server is an Apache server then encoding can be controlled in .htaccess or via file extensions. See:

    http://www.w3.org/International/questions/qa-htaccess-charset

    Other servers offer similar kinds of control. It's still incredibly inconvenient, but certainly the UCD files should be served correctly.

    Just out of curiosity, why *don't* all the UCD files use UTF-8?

    Addison P. Phillips
    Globalization Architect, Quest Software
    http://www.quest.com

    Chair, Internationalization Core Working Group
    http://www.w3.org/International

    Internationalization is not a feature.
    It is an architecture.

    > -----Original Message-----
    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
    > Behalf Of Markus Scherer
    > Sent: vendredi 4 mars 2005 09:03
    > To: unicode@unicode.org
    > Subject: Re: Bad Content-type headers on Unicode web site?
    >
    > The problem is of course that web servers usually don't know which
    > file has which encoding. A recent Apache update that made ISO-8859-1
    > the default, and sent it rather than leaving the charset unspecified,
    > is famous for wreaking havoc on other-charset content. There is a way
    > to specify per-file meta data but that's a manual process and tends to
    > get out of sync.
    >
    > You also can't declare the same charset for all UCD files because
    > there are at least two in use (ISO-8859-1 and UTF-8) for different
    > files.
    >
    > Unicode signatures might help, but are controversial, and may break
    > UCD file parsers.
    >
    > It looks like there is no good solution. HTML and XML have mechanisms
    > for internal charset declarations, but plain text doesn't. If you add
    > some syntax, it becomes markup...
    >
    > I suppose the UCD files (the ones which are not in ISO-8859-1) could
    > get a comment line with some syntax, and the web server could in
    > principle parse the files and pick that up. That's a custom solution
    > then. Or add the signature on the server and strip it while serving.
    > (Production tool change.)
    >
    > markus
    >
    > --
    > Opinions expressed here may not reflect my company's positions unless
    > otherwise noted.



    This archive was generated by hypermail 2.1.5 : Fri Mar 04 2005 - 12:25:00 CST