Kevin Bracey wrote:
> Useless Basic Latin only 95
> Limited [...] + halfwidth katakana 158
> Standard [...] + JIS X 0208 7037
> Above average [...] + JIS X 0212 13104
If these memory constraint are really hard, there can be several
intermediate levels between "Limited" and "Standard".
First of all, you can remove from JIS X 0208 all the characters that are not
strictly needed to write Japanese. This includes the Greek and Cyrillic
alphabets, and a handful of dingbats and funny things. My wife would kill me
if I do the counting right now... Let estimate 4 blocks of 94 characters
each (roughly *376* slots saved).
A much more substantial cut can be achieved by selecting only frequently
used kanjis. One good source for a reduced set is "Japan-China-Taiwan
Koichi Yasuoka's CJK page
(http://kanji.zinbun.kyoto-u.ac.jp/~yasuoka/CJK.html). I have counted 1992
kanjis in the Japanese column, that corresponds to a net *4364* characters
discount (the basic 6358 JIS kanjis minus Yasuoka's 1992 daily use kanji's).
Another good source for similar figures is Jim Breen's celebrated KANJIDIC
(http://www.csse.monash.edu.au/~jwb/kanjidic.html). The [G] field contains
the kanji "grade" (1 to 6 and 9). It is roughly the school year when a
Japanese kid learns each kanji: 1-6 is elementary school, (7, 8,) and 9 is
primary middle school.
My counts for the seven grades, with progressive sums are:
1: 46, 46
2: 105, 151
3: 186, 337
4: 203, 540
5: 193, 733
6: 142, 875
9: 959, 1834
If you pick all the rated characters, up to grade 9, you should have more or
less the same list of daily use characters mentioned above. (I think
Yasuoka's list is more up-to-date, as it probably reflects more recent
reforms in Japan's schooling system).
If you stop at grade 6, you have (if I'm not mistaken) the famous *Toyou
Kanzi* list, which is the dream of every foreigner student of Japanese. This
would make up a huge *5483* characters saving! However, you must be sure
that your application can do with a relatively basic vocabulary and,
particularly, that it doesn't need many proper names (people or places).
You could even consider stopping at grade 2. This is the Zyouyou Kanzi list
which, the basic literacy level for a Japanese. In this case, you would
nearly reach the numbers of a single-byte character set. The drawback, of
course, is that your application will write Japanese as good as a 7 years
None of these reductions is viable for a general purpose application that
has to handle Japanese text. However, if it is just for the messages issued
by a print head controller, who knows...
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT