HTML could also be treat as plain text from converter point of view,
http://home.netscape.com/ja for Shift_JIS
http://www.yahoo.co.jp/ for EUC-JP
Momoi- do you have better data ?
Frank da Cruz wrote:
> Does anybody have fairly large ftp-able samples of Shift-JIS
> (Code Page 982) plain text containing a "typical" mixture of
> halfwidth Roman, halfwidth Katakana, and Kanji? (Does anybody
> have an idea what the typical mixture might be over a very
> large sample of Japanese text?)
> Same question for Japanese EUC.
> And for that matter, also JIS-7.
> As far as I know, these are the only three commonly-used
> Japanese character sets (besides Unicode) that include both
> single- and doublewidth characters.
> (For working on conversion to/from Unicode, of course :-)
> - Frank
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT