From: Erik van der Poel (erik@vanderpoel.org)
Date: Fri Mar 04 2005 - 12:41:18 CST
Markus Scherer wrote:
> I suppose the UCD files (the ones which are not in ISO-8859-1) could
> get a comment line with some syntax, and the web server could in
> principle parse the files and pick that up. That's a custom solution
> then. Or add the signature on the server and strip it while serving.
> (Production tool change.)
It might be a good idea to convert all of unicode.org's non-UTF-8 *.txt
files to UTF-8, if that wouldn't cause too many problems. I'm not very
familiar with the ways in which those *.txt files are used, so I may be
grossly underestimating the impact of such a change. If you can convert
all of them to UTF-8, then Apache probably also has a way to specify the
charset for *.txt site-wide. Alternatively, you might be able to do it
on a per-directory basis. Per-file might not be good since it could more
easily get out of sync, as you say. Look for Apache in:
http://www.w3.org/International/tutorials/tutorial-char-enc/#declaring
Back in 1995 at Netscape, I thought we had the opportunity to force all
Web servers to specify the charset for non-iso-8859-1 text. We were
about to ship 2.0 with a number of new features, and I argued that that
might be a good time to treat a missing charset parameter as iso-8859-1.
Netscape had a large market share at that time. A lot of users would
have started to use 2.0, and the Web servers would have had to follow
suit. Or at least, that was my argument. The management said no. It is
actually difficult to tell whether that change would have been
successful. If it had led to a lot of pain and suffering, we might have
shipped 2.01 immediately, with that decision reversed.
So in this case, I don't think hindsight is 20-20, though it might have
worked if we tried it earlier in the game, say, Netscape 1.1 or 1.0. It
is somewhat embarrassing to admit all this on an open mailing list, but
maybe there is a lesson to be learned from it.
A somewhat similar issue is that of strict HTML parsing. Mosaic and
Netscape are considered notorious in some circles for being so lax about
HTML syntax. Some people say that is why we have so much *garbage* out
there, and others say that is why we have *so much* "garbage" out there.
See the *emphasis* I have placed in that sentence.
The latter argument is that the Web exploded *because* it was so easy to
create content, even by hand.
There was one issue at around that time, though, where Netscape actually
*did* decide to tighten the parser. We went round and round in circles,
trying to decide, and we finally bit that bullet, and it worked. I
forget the issue, but I think it had something to do with quotes in HTML
attributes.
Erik
This archive was generated by hypermail 2.1.5 : Fri Mar 04 2005 - 12:42:34 CST