From: Markus Scherer (markus.icu@gmail.com)
Date: Fri Mar 04 2005 - 11:03:23 CST
The problem is of course that web servers usually don't know which
file has which encoding. A recent Apache update that made ISO-8859-1
the default, and sent it rather than leaving the charset unspecified,
is famous for wreaking havoc on other-charset content. There is a way
to specify per-file meta data but that's a manual process and tends to
get out of sync.
You also can't declare the same charset for all UCD files because
there are at least two in use (ISO-8859-1 and UTF-8) for different
files.
Unicode signatures might help, but are controversial, and may break
UCD file parsers.
It looks like there is no good solution. HTML and XML have mechanisms
for internal charset declarations, but plain text doesn't. If you add
some syntax, it becomes markup...
I suppose the UCD files (the ones which are not in ISO-8859-1) could
get a comment line with some syntax, and the web server could in
principle parse the files and pick that up. That's a custom solution
then. Or add the signature on the server and strip it while serving.
(Production tool change.)
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Fri Mar 04 2005 - 11:04:41 CST