Re: Web Form: Other Question: CJK

From: Chris Jacobs (chris.jacobs@freeler.nl)
Date: Fri Oct 10 2003 - 20:38:26 CST


If you have a scalar value then you can look it up in the UniHan database.

http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=4e01

I would not rely on the mappings to major standards to determine the
language, I can imagine that maybe the chinese include some non-chinese
kanji in their standards because they come up in their foreign affairs.

I would go for the Phonetic Data.

If there only is an entry for Cantonese or Mandarin pronunciation then
surely it is Chinese.
If there only is an entry for Japanese Kun pronunciation then surely it is
Japanese.

Do not try to guess the language of a text from just one value, if the text
contains kana then assume the kanji are Japanese too, if the text contains
hangul then assume the kanji are Korean too.

If you don't want to consult the UniHan database over the WWW then the data
files for it are available at ftp://ftp.unicode.org/

----- Original Message -----
From: "Magda Danish (Unicode)" <v-magdad@microsoft.com>
To: <unicode@unicode.org>
Cc: <roberto.carcione@ampersoftware.it>
Sent: Friday, October 10, 2003 11:48 PM
Subject: FW: Web Form: Other Question: CJK

> Roberto,
>
> I am forwarding your question to the Unicode mailing list for possible
> answers from the list's subscribers.
>
> Regards,
>
> Magda Danish
> Administrative Director
> The Unicode Consortium
> 650-693-3921
>
>
> > -----Original Message-----
> > Date/Time: Thu Oct 9 10:20:19 EDT 2003
> > Contact: roberto.carcione@ampersoftware.it
> > Report Type: Other Question, Problem, or Feedback
> >
> > Hi at all,
> > i have a little question:
> > Characters in the unicode range U+4E00 and U+9FFF are Unified
> > Ideographs for
> > CJK languages. This means that all characters are togheter
> > for Chinense,
> > Japanese and Korean languages? If i take a charcters for,
> > example U+4E01,
> > this is a valid character for all three languages?
> > My problem is to recognize from the 32 bit value of unicode
> > character if this
> > is a chinese character or korean or japanese. How can do this?
> >
> > I develop international application under win98, win200 with
> > Visual Studio 6.0
> >
> > thanks a lot.
> >
> > Roberto (ITALY)
> >
> > -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> > (End of Report)
> >
> >
> >
>
>



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST