Re: FW: Web Form: Other Question: CJK

From: Edward H. Trager (ehtrager@umich.edu)
Date: Sat Oct 11 2003 - 09:31:05 CST


On Friday 2003.10.10 14:48:58 -0700, Magda Danish (Unicode) wrote:
> Roberto,
>
> I am forwarding your question to the Unicode mailing list for possible
> answers from the list's subscribers.
>
> Regards,
>
> Magda Danish
> Administrative Director
> The Unicode Consortium
> 650-693-3921
>
>
> > -----Original Message-----
> > Date/Time: Thu Oct 9 10:20:19 EDT 2003
> > Contact: roberto.carcione@ampersoftware.it
> > Report Type: Other Question, Problem, or Feedback
> >
> > Hi at all,
> > i have a little question:
> > Characters in the unicode range U+4E00 and U+9FFF are Unified
> > Ideographs for
> > CJK languages. This means that all characters are togheter
> > for Chinense,
> > Japanese and Korean languages?

Yes, that's why they are called "unified".

> > If i take a charcters for,
> > example U+4E01,
> > this is a valid character for all three languages?

Most likely. There are some characters that only occur in
modern simplified Chinese, some that for the most part only occur in modern
traditional Chinese (such as used in Taiwan or Hong Kong), some that only
occur in Japanese.

> > My problem is to recognize from the 32 bit value of unicode
> > character if this
> > is a chinese character or korean or japanese. How can do this?

You can't, so don't try to do it on a character-by-character basis. It
is useless. As a human looking at a string of text, you can tell what
language it is from the context. Of course for Japanese or Korean you
will expect to see Hiragana or Katakana (for Japanese) or Korean syllables.
But there is every possibility that a Korean text might contain embedded
Chinese quotations, or Japanese containing embedded Korean, or ... you
get the idea ...

> >
> > I develop international application under win98, win200 with
> > Visual Studio 6.0
> >
> > thanks a lot.
> >
> > Roberto (ITALY)
> >
> > -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> > (End of Report)
> >
> >
> >
>



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST