RE: japanese xml

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Thu Aug 30 2001 - 17:38:01 EDT


David,

> -----Original Message-----
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
> Behalf Of David Starner
> Sent: Thursday, August 30, 2001 11:13 AM
> To: unicode@unicode.org
> Subject: Re: japanese xml
>
>
> On Thu, Aug 30, 2001 at 09:51:24AM -0700, Addison Phillips [wM] wrote:
> > And it is worth mentioning, becuase, in fact,
> > EUC-JP (and many other encodings) are perfectly interoperable----for the
> > subset of characters that they represent.
>
> One of the big complaints I hear in trying to Unicodize Linux is that
> that EUC-JP, Shift-JIS, and CP932 are all encodings that include
> JIS X 0208; but they all map the JIS X 0208 portion differently.
> So EUC-JP <-> Shift-JIS produces different results than EUC-JP <->
> Unicode <-> Shift-JIS. That's not prefectly interoperable.
>

There are ways to improve the compatibility. For example with ICU if I want
to convert Shift_JIS to EUC-JP it is impossible to have a complete set of
any-to-any converters so I convert first to UTF-16. In the process there
are some characters like the FULLWIDTH HYPHEN-MINUS where the JIS equivalent
is a double wide minus. If I issue a ucnv_setFallback(open_locale->conv,
TRUE); then this character will convert.

The ibm-943 converter has:

<UFF0A> \x81\x96 |0
<UFF0B> \x81\x7B |0
<UFF0C> \x81\x43 |0
<UFF0D> \x81\x7C |1
<UFF0E> \x81\x44 |0
<UFF0F> \x81\
Mi|0

Note that FF0D has the compatibility mapping flag.

Carl



This archive was generated by hypermail 2.1.2 : Thu Aug 30 2001 - 19:09:13 EDT