Re: FW: Web Form: Other Question: CJK

From: John Delacour (JD@BD8.COM)
Date: Sat Oct 11 2003 - 10:49:09 CST


> > Contact: roberto.carcione@ampersoftware.it
> > Report Type: Other Question, Problem, or Feedback
> >
> > My problem is to recognize from the 32 bit value of unicode
> > character if this is a chinese character or korean or japanese.
> How can do this?

You can tell if it is NOT from a legacy character set such as
shift_jis or big5 by failing to convert it to that character set. Or
you can look it up in unihan.txt
<http://www.unicode.org/Public/UNIDATA/Unihan.txt> (25 megabytes,
also at the ftp site). There are also Perl routines for getting at
the information.

U+4E01 kAlternateKangXi 0075.003

JD



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST