Re: Japanese EUC and Shift-JIS text samples?

From: Yung-Fong Tang (
Date: Mon Oct 04 1999 - 16:27:51 EDT

HTML could also be treat as plain text from converter point of view,
right ?

If so...; for Shift_JIS for EUC-JP

Momoi- do you have better data ?

Frank da Cruz wrote:

> Does anybody have fairly large ftp-able samples of Shift-JIS
> (Code Page 982) plain text containing a "typical" mixture of
> halfwidth Roman, halfwidth Katakana, and Kanji? (Does anybody
> have an idea what the typical mixture might be over a very
> large sample of Japanese text?)
> Same question for Japanese EUC.
> And for that matter, also JIS-7.
> As far as I know, these are the only three commonly-used
> Japanese character sets (besides Unicode) that include both
> single- and doublewidth characters.
> (For working on conversion to/from Unicode, of course :-)
> - Frank

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT