On 30/08/2001 18:00:22 Mike Ayers wrote:
[...]
> Misha was not talking about EUC-JP, rather EUC-unicode (or some name
> like that), which encodes unicode scalar values using the EUC method, and
> uses character references for those values (most of them) that are outside
> of the EUC encoding range. Have you tested your parser against that?
Interesting. My original reply is pasted in below. Please
tell me how you managed to arrive at your interpretation.
Thanks,
Misha
~~~~~~~~~~~~~
That is, IMO, quite a misleading reply. It would be more helpful to say
something like:
Yes, it is OK for Unicode code points to be encoded using EUC. Keep in
mind, though, that the EUC character repertoire is a lot smaller than
the Unicode character repertoire. Consequently, many Unicode characters
cannot be directly encoded using EUC. Of course, EUC (EUC-JP in the
case of Japanese) may cover all the characters you require, in which
case there is no problem. Additionally, if you are thinking of XML (or
HTML) then you can encode *all* Unicode characters in an EUC-encoded
document, by employing numeric character references for characters
outside the EUC character repertoire. Using the same technique, you can
encode all Unicode characters in an ASCII-encoded document.
Misha
-----------------------------------------------------------------
Visit our Internet site at http://www.reuters.com
Any views expressed in this message are those of the individual
sender, except where the sender specifically states them to be
the views of Reuters Ltd.
This archive was generated by hypermail 2.1.2 : Thu Aug 30 2001 - 14:58:25 EDT