RE: japanese xml

From: Misha.Wolf@reuters.com
Date: Thu Aug 30 2001 - 09:06:15 EDT


IMO, I correctly replied to Viranga's question and I've
no idea what you're talking about below.

Misha

On 30/08/2001 13:46:57 Marco Cimarosti wrote:
> Misha Wolf wrote:
> > On 30/08/2001 09:16:21 Marco Cimarosti wrote:
> > > Viranga Ratnaike wrote:
> > > > Is it ok for Unicode code points to be
> > > > encoded/serialized using EUC?
> > [...]
> > >
> > > EUC size simply doesn't fit Unicode.
> > >
> > [...]
> > That is, IMO, quite a misleading reply. It would be more
> > helpful to say something like:
> >
> > Yes, it is OK for Unicode code points to be encoded using
> > EUC.
>
> *This* is, IMHO, a *very* misleading statement.
>
> No: it is not OK to encode Unicode in EUC, because this would be technically
> impossible, as I explained before and you explain again here:
>
> > Keep in
> > mind, though, that the EUC character repertoire is a lot smaller than
> > the Unicode character repertoire. Consequently, many Unicode
> > characters cannot be directly encoded using EUC.
>
> In fact: only Unicode chars U+0000 to U+4000 would be representable in an
> hypothetical "euc-unicode" encoding (so, e.g., no Unified ideographs would
> be allowed, as they start at U+4E00).
>
> Moreover, even if such a hybrid was possible, no current application would
> recognize or process it, because the only expected forms of Unicode are
> UTF-8, UTF-16, and UTF-32 (plus some obsolete or variant forms that is not
> worth mentioning here).
>
> > Of course, EUC (EUC-JP in the
> > case of Japanese) may cover all the characters you require, in which
> > case there is no problem.
>
> What does Unicode have to do with this!? You are talking now about EUC-JP
> (a.k.a. EUC-JIS); Viranga was asking about using EUC to serialize Unicode.
>
> > Additionally, if you are thinking of XML (or
> > HTML) then you can encode *all* Unicode characters in an EUC-encoded
> > document, by employing numeric character references for characters
> > outside the EUC character repertoire. Using the same
> > technique, you can
> > encode all Unicode characters in an ASCII-encoded document.
>
> OK. But what does this have to do with Unicode, JIS, EUC, or anything else
> in Viranga's question?
>
> You are not obliged to reply a question but, if you decide to do so, you
> should reply to it, not to something else.
>
> _ Marco

-----------------------------------------------------------------
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of the individual
sender, except where the sender specifically states them to be
the views of Reuters Ltd.



This archive was generated by hypermail 2.1.2 : Thu Aug 30 2001 - 10:28:30 EDT