Re: Sample CJK files...

From: Mustafa Hasham (ha98my21@acs.wooster.edu)
Date: Sun Mar 08 1998 - 21:25:26 EST


Hi Jack:

I have been hacking away at Java inadequacies to correctly recognize small
endian text as put forward by NT. I guess I have a software solution. Now,
I need text data to test.

Thanks for all your help so far. I will look at the pages you have
recommended and will keep you updated.

The Java converters are very picky in what text you provide. But a simple
"flipper" of bytes worked for me. :)

Mustafa

On Sun, 8 Mar 1998, Jake Morrison wrote:

> Mustafa,
>
> An excellent source is the pages for the 10th International Unicode
> Conference:
> http://www.unicode.org/unicode/iuc10/languages.html
>
> It also has the data in Unicode, so you can check your work.
>
> Another option is to surf the home pages for the major Asian companies.
>
> If you want lots of random text (sometimes very random :-), you can get
> messages from Usenet news.
>
> The tw.* hierarchy is from Taiwan
> The hk.* hierarchy is from Hong Kong
> The fj.* hierarchy is from Japan
> The han.* hierarchy is from Korea
>
> Regards,
> Jake
>
> On Sun, 8 Mar 1998, Mustafa Hasham wrote:
>
> >
> > Hi:
> >
> > As part of a project in a CS class, I intend to convert CJK encoded text
> > files into Unicode. I am using Windows NT and program in Java. Does anyone
> > out there know of any sample text files I can use? Any encoding scheme
> > would be fine... Big5, Kanji, GB, etc.. I do not have access to an input
> > editor.
> >
> > Thanks
> >
> > Mustafa
> >
> >
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT