RE: japanese xml

From: Addison Phillips [wM] (aphillips@webmethods.com)
Date: Thu Aug 30 2001 - 13:27:10 EDT


That's not what he said in the responses *I* read. Perhaps I missed one on
this thread. As near as I recall, Misha wrote:

"Of course, EUC (EUC-JP in the
case of Japanese) may cover all the characters you require, in which
case there is no problem. Additionally, if you are thinking of XML (or
HTML) then you can encode *all* Unicode characters in an EUC-encoded
document, by employing numeric character references for characters
outside the EUC character repertoire."

IOW> That's not "EUC-unicode". I don't see a mention anywhere of that
(hypothetical) encoding. That's "EUC-JP with characters outside EUC-JP
represented as NCRs", and our parser handles that quite well...

Addison

-----Original Message-----
From: Ayers, Mike [mailto:Mike_Ayers@bmc.com]
Sent: Thursday, August 30, 2001 10:00 AM
To: 'Addison Phillips [wM]'
Cc: unicode@unicode.org
Subject: RE: japanese xml

> From: Addison Phillips [wM] [mailto:aphillips@webmethods.com]
> Sent: Thursday, August 30, 2001 09:51 AM

> 4. However, you can use any other encoding, provided you tag the file
> appropriately (so that the parser knows what the encoding is and can
> translate it to its internal representation).

        Slight but relevant correction: you can use any encoding of which
the parser is aware.

> 5 You are not required to use EUC-JP for your Japanese XML
> files: you can
> use the Unicode encodings directly. In some cases, though, your file
> editting software may make it easier to work with EUC-JP (or
> Shift-JIS/Microsoft Code Page 932).

        Misha was not talking about EUC-JP, rather EUC-unicode (or some name
like that), which encodes unicode scalar values using the EUC method, and
uses character references for those values (most of them) that are outside
of the EUC encoding range. Have you tested your parser against that?

/|/|ike



This archive was generated by hypermail 2.1.2 : Thu Aug 30 2001 - 14:35:05 EDT