RE: japanese xml

From: Martin Duerst (duerst@w3.org)
Date: Fri Aug 31 2001 - 02:05:44 EDT


At 10:39 01/08/30 +0100, Misha.Wolf@reuters.com wrote:

>Additionally, if you are thinking of XML (or
>HTML) then you can encode *all* Unicode characters in an EUC-encoded
>document, by employing numeric character references for characters
>outside the EUC character repertoire. Using the same technique, you can
>encode all Unicode characters in an ASCII-encoded document.

One small clarification: Numeric character references can only be
used in content (including attribute values), but not in element/
attribute names,... So if you have a Japanese document with
ASCII-based markup (always true for HTML), or with Japanese markup
(what the question was about), euc-jp will work. However, if you
have Arabic element names, Devanagari attribute names, processing
instructions using Hangul, XML comments containing Mongolian,
or anything similar, you have to keep the document in some
Unicode-based encoding and cannot use euc-jp. Not that such things
are likely, but better be sure.

And to Marco: It's great to hear that you think that the existence
of numeric character references in XML and HTML, and the fact that
they are based on Unicode, is common knowlegde. For somebody like
Misha and me who have worked on getting us there, it may take some
more time to be convinced about that.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Fri Aug 31 2001 - 03:08:58 EDT