I am working on software to emit HTML in the encoding
and character set of the user's choice, from SGML/XML
documents which can contain any Plane 1 Unicode character.
The question is what to do with characters outside the
selected encoding. I thought I would use the "numeric"
character entity reference and IE5 at least seems to
render that well, but Netscape Communicator 4.6 doesn't.
One way to look at this is: how do I use unicode as an
"escape" to include some isolated content on a web page
of arbitrary encoding?
For example, I have something such as:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head><title>Unicode in a Latin 2 page</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-2">
<body style="line-height: 16pt"><div class="pgbrk" style="padding-top: 48pt">
<p>Článek Úvod Žádný čest čin činěn činů činům činnost činnosti
jakmile jako jakož jakožto jazyka jež jediné jednat jednotkou jednotlivec</p>
<p>CYRILLIC CAPITAL LETTER DJE: Ђ</p>
<p>CAPITAL LETTER GAMMA: Γ</p>
<p>HIRAGANA LETTER KA: か</p>
<p>jeho jejich jemu jimi jiného jinému jiných jiným jinými jsou každému každý
which probably looks awful since your email client is not likely
set to display Latin 2, but which can also be seen at:
If I change the meta tag to:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
then Netscape does slightly better (still stumbles over &#x-anything
and doesn't display the hiragana, but does display the DJE and GAMMA
if I use decimal values) but of course now the Czech words are not
Is there some way I can nudge Netscape's browser to display these?
Is there a better way to write this admittedly mongrel HTML content?
I have heard somewhere that it is possible to change charset choice
"on the fly" and if would work, I would appreciate a pointer to
somewhere that says how best to do this.
Thanks in advance for any insights.
--- Gary Grosso firstname.lastname@example.org Arbortext, Inc. Ann Arbor, MI, USA
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT