Re: Mac OSX + 17,500 Kanji

From: Ienup Sung (ienup.sung@eng.sun.com)
Date: Fri Mar 16 2001 - 22:36:23 EST


Hi,

I just did some rough countings and found out that any typical Unix Japanese
eucJP locale will support about 12,500 Kanjis from JIS X 0208 and JIS X 0212
not counting symbols , UDCs, and so on but counting only pre-defined
Kanji characters in the national stds.

In Solaris 8, pretty much all Unicode/UTF-8 locales whether that is
ja_JP.UTF-8, zh_CN.UTF-8, en_US.UTF-8, th_TH.UTF-8, and so on, supports about,
more or less, 20,000 ideographs from CJK Unified Ideographs Extension A,
CJK Unified Ideographs, and CJK Compatibility Ideographs in Unicode 3.0
and, among the locales, zh_CN.UTF-8 supports the biggest number of the
ideographs at about 21,205.

I'm sure other vendors are probably also doing pretty much same in
terms of the number of supported ideograph glyphs.

With regards,

Ienup

PS. I had a brief chance to look at their Aqua GUI during the MacOS X
installation (in Japanese) and it sure looked wonderful and even pleasing to
eyes by the way I have to say.

] Date: Fri, 16 Mar 2001 18:13:08 -0800 (GMT-0800)
] From: Stephen Cremin <asianfilmlibrary@mac.com>
] Subject: Mac OSX + 17,500 Kanji
] To: Unicode List <unicode@unicode.org>
] MIME-version: 1.0
] Content-transfer-encoding: 7bit
]
] I didn't catch Steve Job's keynote in Tokyo but I believe he announced that
] Mac OSX (out 24 March) would be supplied with Japanese fonts representing
] 17,500 kanji. Presumably, Mac OSX is using Unicode to represent text
] internally.
]
] I database information in Japanese (as well as Chinese and Korean) and I
] generally takes the lowest common denominator approach. I approximate
] characters to what can be displayed back to me [in the Japanese version of
] Mac OS8.0 running various language kits], with a note of the correct Unicode
] codepoint for future reference. Another factor that dictates how accurately
] I store kanji correctly is what I can expect to present to people over the
] internet.
]
] Presumably using UTF8 encoding, other Mac OSX users who install the Japanese
] fonts can now display these 17,500 kanji leading to greater accuracy in
] online information. But, of course, Mac OS X will still be a minority even
] among Mac users for the next year or more. Forgive me for being so insular,
] but what is the situation in the non-Mac world? How many characters can the
] typical Japanese-enabled UNIX or IBM compatible handle?
]
] I understand that different fonts may not render all codepoints, but what is
] typical for a pre-installed Japanese user's system in the non Mac-world?
] And what changes are taking place in the near future? I don't want a Mac vs
] IBM vs UNIX debate, just an idea that as a web developer, how many kanji can
] I presume to work with and in what timeframe?
]
] And any announcements from Apple on Chinese and Korean fonts? If this
] 17,500 kanji refers to a specific "Unicode font" then I presume there are
] various Chinese, Japanese and Korean flavours. And what other advances are
] there in Mac OSX in terms of tagging unicode to aid sorting, etc? It would
] be more than wonderful if I could assume that every "carbonised" application
] will allow me to sort in my choice of language and its variants [North
] Korean sort order, South Korean sort order, etc], search two-byte text
] reliably, searching two-byte text "intelligently" (allowing for semantically
] correct variants, allowing for commonly mistaken kanji, etc), etc? But I'm
] not so optimistic.
]
] Stephen Cremin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT