Re: Numerical character referneces in HTML (was: Multilingual Documents)

From: Erik van der Poel (erik@netscape.com)
Date: Mon Dec 06 1999 - 16:19:45 EST


Otto Stolz wrote:
>
> Am 1999-12-03 um 10:11 h hat Erik van der Poel geschrieben:
> > Even though [Netscape 4.X] doesn't support UTF-8 properly.
>
> On the contrary, I am under the impression that Netscape 4.x does a
> remarkable good job in supporting UTF-8

I was referring to the problem discussed earlier in this thread, where
Web site developers cannot assume that non-Times/Courier users (e.g.
Japanese) have set their Unicode font properly. The default is
Times/Courier, which don't support Japanese, and the Win32 version
doesn't do font switching.

> However, there is an annoying deficiency in support of non-UTF encodings.
>
> As said before in this thread, the document character set of any HTML 4
> file is UCS, cf. <http://www.w3.org/TR/REC-html40/charset.html>. Hence,
> any UCS character may be specified by a numerical character reference
> (NCR), regardless of the content transfer encoding, cf.
> <http://www.w3.org/TR/REC-html40/charset.html#h-5.3.1>. This would come
> handy for including an occasional extra character, e. g. from the
> General Punctuation block, with an otherwise 8-bit coded document.
>
> However, this does not work with Netscape 4.x: though these browsers know
> how to find glyphs for arbitrary UCS characters (if locally available, at
> all), they apply this knowledge only for UTF-8 encoded files, cf. my ex-
> amples <http://www.rz.uni-konstanz.de/y2k/test/Go-Latin.htm> vs.
> <http://www.rz.uni-konstanz.de/y2k/test/Go-UTF.htm>.
>
> Also, Netscape 4.x browsers do not recognize hexadecadic NCRs, cf.
> my examples
> <http://www.rz.uni-konstanz.de/y2k/test/Euro-UTF.htm>,
> <http://www.rz.uni-konstanz.de/y2k/test/Euro-Latin-9.htm>, and again
> <http://www.rz.uni-konstanz.de/y2k/test/Go-Latin.htm> and
> <http://www.rz.uni-konstanz.de/y2k/test/Go-UTF.htm>.
>
> (Btw., the Euro-Latin-9.htm file is encoded in ISO 8859-15, which only
> the most recent Netscape versions (4.7, I think) process properly.
> Whilst IE 5.0 still does not recognize ISO 8859-15.)
>
> Erik: Which version of Netscape will mend those errors?

Netscape 4.X is unlikely to support NCRs properly in documents
transmitted in non-Unicode encodings. The architecture doesn't make it
easy to do this in a way that would maintain the performance (speed) and
keep documents looking good on the screen.

Netscape 4.X could add support for hexadecimal NCRs, but I don't know if
it's worth it.

Netscape 4.X is being maintained by a small group.

The open source version is called Mozilla, and is being developed by
Netscape and AOL engineers and external developers. Netscape will
probably ship its own version with the Netscape brand (name), and it may
be called 5.0. I don't know for sure, since that decision is made by
marketing people.

Mozilla already supports NCRs in non-Unicode documents, and hex NCRs
too.

  http://www.mozilla.org/binaries.html

Erik



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT