From: Addison Phillips (firstname.lastname@example.org)
Date: Fri Mar 04 2005 - 12:23:41 CST
If the server is an Apache server then encoding can be controlled in .htaccess or via file extensions. See:
Other servers offer similar kinds of control. It's still incredibly inconvenient, but certainly the UCD files should be served correctly.
Just out of curiosity, why *don't* all the UCD files use UTF-8?
Addison P. Phillips
Globalization Architect, Quest Software
Chair, Internationalization Core Working Group
Internationalization is not a feature.
It is an architecture.
> -----Original Message-----
> From: email@example.com [mailto:firstname.lastname@example.org] On
> Behalf Of Markus Scherer
> Sent: vendredi 4 mars 2005 09:03
> To: email@example.com
> Subject: Re: Bad Content-type headers on Unicode web site?
> The problem is of course that web servers usually don't know which
> file has which encoding. A recent Apache update that made ISO-8859-1
> the default, and sent it rather than leaving the charset unspecified,
> is famous for wreaking havoc on other-charset content. There is a way
> to specify per-file meta data but that's a manual process and tends to
> get out of sync.
> You also can't declare the same charset for all UCD files because
> there are at least two in use (ISO-8859-1 and UTF-8) for different
> Unicode signatures might help, but are controversial, and may break
> UCD file parsers.
> It looks like there is no good solution. HTML and XML have mechanisms
> for internal charset declarations, but plain text doesn't. If you add
> some syntax, it becomes markup...
> I suppose the UCD files (the ones which are not in ISO-8859-1) could
> get a comment line with some syntax, and the web server could in
> principle parse the files and pick that up. That's a custom solution
> then. Or add the signature on the server and strip it while serving.
> (Production tool change.)
> Opinions expressed here may not reflect my company's positions unless
> otherwise noted.
This archive was generated by hypermail 2.1.5 : Fri Mar 04 2005 - 12:25:00 CST