I have been hacking away at Java inadequacies to correctly recognize small
endian text as put forward by NT. I guess I have a software solution. Now,
I need text data to test.
Thanks for all your help so far. I will look at the pages you have
recommended and will keep you updated.
The Java converters are very picky in what text you provide. But a simple
"flipper" of bytes worked for me. :)
On Sun, 8 Mar 1998, Jake Morrison wrote:
> An excellent source is the pages for the 10th International Unicode
> It also has the data in Unicode, so you can check your work.
> Another option is to surf the home pages for the major Asian companies.
> If you want lots of random text (sometimes very random :-), you can get
> messages from Usenet news.
> The tw.* hierarchy is from Taiwan
> The hk.* hierarchy is from Hong Kong
> The fj.* hierarchy is from Japan
> The han.* hierarchy is from Korea
> On Sun, 8 Mar 1998, Mustafa Hasham wrote:
> > Hi:
> > As part of a project in a CS class, I intend to convert CJK encoded text
> > files into Unicode. I am using Windows NT and program in Java. Does anyone
> > out there know of any sample text files I can use? Any encoding scheme
> > would be fine... Big5, Kanji, GB, etc.. I do not have access to an input
> > editor.
> > Thanks
> > Mustafa
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT