RE: Communicator Unicode

From: Gavin Nicol (
Date: Mon Sep 29 1997 - 09:18:44 EDT

>> You can't. A given entity must all be in a single encoding of
>> the document character set.
>Gavin is incorrect. Since it is clear here that it is the entity's
>storage object being referred to, the encoding of the storage object has
>no necessary relationship to the document character set. Furthermore,
>the encoding of the entity as processed by an HTML parse also has no
>necessary relationship to the document character set. For all intents
>and purposes, the document character set is only useful in HTML for
>determining how to interpret numeric character references.

Correct me if I'm wrong, but doesn't the document character set define
the repertoire of characters that are legal within a document, and
what roles they should play (here I am actually using "document
character set" to include the syntax character set)? To me this means
that the entity must, is some way, encode characters from the document
character set.

There is only ever a single document character set in SGML, HTML, and
XML. I stand by my claim that you cannot mix "charsets" or "character
sets" in a single entity.

I know that SGML doesn't say anything about handling of non-SGML
characters, but I do not believe that this detracts from the overall

