Re: Subset of Unicode to represent Japanese Kanji?

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Wed Jul 12 2000 - 16:47:49 EDT


Am 2000-07-11 um 7:02 h hat Michael Martin geschrieben:
> English, Dutch, French, German, Italian, Japanese, Portuguese, and Spanish.
> It is my understanding that all of these languages except Japanese can be
> supported with the basic Latin and Latin Supplement subset of Unicode
> (U+0000 ... U+00FF [...]).

Latin-1 was invented to support those languages, but falls short of doing
so, adequately. You will need additional characters from the following
ranges:
- Latin Extended A (e. g. U+0152 and U+0153 for French, U+0133 for Dutch,
  perhaps U+017F for German (if you want to cover Fraktur fonts, that is))
- general punctuation (e. g. U+201E and U+201A for German; U+2019 and
  probably U+2010 through U+2015 for all of those Languages)
- Currency Symbols (at least U+20AC, perhaps also U+20A3, U+20A4,
  U+20A7; note also U+00A3, U+00A5 in the Latin-1, and U+0192 in the
  Latin Extended-B regions, respectively)
- Depending on the application envisaged, you may also wish to include
  characters from the following areas:
  - Number Forms (U+2150 through U+218F), particularly fractions
  - Arrows (U+2190 through 21FF); Box Drawing, Block Elements, and
    Geometric Shapes (U+2000 through U+25FF)
  - Mathematical Operators and Miscellaneous technical (U+2200 through
    U+23FF); Miscellaneous Symbols and Dingbats (U+200 through U+27BF)
- Depending on the technolgy used, you may have to include characters
  from the following ranges:
  - Superscripts and Subscripts (U+2070 through U+209F)
  - Presentation Forms (e.g. ligatures U+FB00 through U+FB06)
  - The Replacement Character U+FFFD
to name just a few :-)

Good starting points for your consideration could be
- the EES, cf. <http://www.egt.ie/standards/ees.html>,
- Microsoft's WGL 4 character set, cf.
  <http://www.microsoft.com/typography/OTSPEC/WGL4.htm>.

> The Japanese I must support is the Kanji form. [...] I cannot support
> Unicode in its entirety due to memory constraints.

If I am not mistaken, Kanji is ideographic characters, which would take
the lion's share of memory to implement. Probably, you have to support
kana (hiragana or katakana).

I do not know Japanese, so others may jump in.

Best wishes,
   Otto Stolz



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT