An excellent source is the pages for the 10th International Unicode
It also has the data in Unicode, so you can check your work.
Another option is to surf the home pages for the major Asian companies.
If you want lots of random text (sometimes very random :-), you can get
messages from Usenet news.
The tw.* hierarchy is from Taiwan
The hk.* hierarchy is from Hong Kong
The fj.* hierarchy is from Japan
The han.* hierarchy is from Korea
On Sun, 8 Mar 1998, Mustafa Hasham wrote:
> As part of a project in a CS class, I intend to convert CJK encoded text
> files into Unicode. I am using Windows NT and program in Java. Does anyone
> out there know of any sample text files I can use? Any encoding scheme
> would be fine... Big5, Kanji, GB, etc.. I do not have access to an input
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT