Re: Java and Unicode

From: Thomas Chan (thomas@atlas.datexx.com)
Date: Wed Nov 15 2000 - 14:08:05 EST

Next message: Elaine Keown: "sort of OT: politics and scripts"
Previous message: Ayers, Mike: "RE: Devanagari question"
Maybe in reply to: Jani Kajala: "Java and Unicode"
Next in thread: Roozbeh Pournader: "Re: Java and Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Wed, 15 Nov 2000, Jungshik Shin wrote:

> On Wed, 15 Nov 2000, Michael (michka) Kaplan wrote:
> > In any case, I think that UTF-16 is the answer here.
> >
> > Many people try to compare this to DBCS, but it really is not the same
> > thing.... understanding lead bytes and trail bytes in DBCS is *astoundingly*
> > more complicated than handling surrogate pairs.
>
> Well, it depends on what multibyte encoding you're talking about. In case
> of 'pure' EUC encodings (EUC-JP, EUC-KR, EUC-CN, EUC-TW) as opposed to
> SJIS(Windows94?), Windows-949(UHC), Windows-950, WIndows-125x(JOHAB),
> ISO-2022-JP(-2), ISO-2022-KR, ISO-2022-CN , it's not that hard (about
> the same as UTF-16, I believe, especially in case of EUC-CN and EUC-KR)

I would move EUC-JP and EUC-TW, and possibly EUC-KR (if you use more than
KS X 1001 in it) to the "complicated" group because of the shifting bytes
required to get to different planes/character sets.

Thomas Chan
tc31@cornell.edu

Next message: Elaine Keown: "sort of OT: politics and scripts"
Previous message: Ayers, Mike: "RE: Devanagari question"
Maybe in reply to: Jani Kajala: "Java and Unicode"
Next in thread: Roozbeh Pournader: "Re: Java and Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT