RE: Benefits of Unicode

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Mon Jan 29 2001 - 11:33:59 EST


Richard Cook wrote:
> Has anybody played devil's advocate to this, with a list of
> "Failings of
> Unicode"? Are there any? :-) This question might in fact result in a
> longer Benefits list ....

Although I've always been a Unicode fan, Richard's invitation is too
tempting. :-)

I'll add these to David Starner's list:

***
*** Unicode often uses two or more different solutions for a single problem.
***

<example 1>
Each different positional form of a letter in Arabic, Syriac or Mongolian is
encoded with the same code point; the rendering engine must select the
proper form. The same problem in Greek and Hebrew has been addressed using
different code points for final and non-final letters, that must be
allocated to separate entries on the keyboard.
</example>

<example 2>
Brahmi-derived scripts (India, Tibet, and South East Asia) use subjoined
letters to form consonant clusters; in the scripts of India Unicode encodes
these subjoined letters with a sequence virama + consonant, while other
scripts (e.g. Tibetan) directly encodes these subjoined letters as
characters.
</example>

<excusing Unicode>
Unicode was not designed on Mount Olympus. It was born in the real world,
where encoding standard already existed. In most cases Unicode has
incorporated these existing character sets, together with their design
choices.
The inconsistency and the inelegance that derived from this patchwork is
balanced by the fact that it is easier to adapt Unicode to applications
originally designed for those old standards.
</excusing>

***
*** Unicode has too many CJK ideographs for the layman, but too few for
scholars.
***

<explaining the problem>
No mechanism exists (apart IDS) to encode ideographs by the components that
constitute them; this would have permitted to minimize the number of code
points required for CJK languages and, at the same time, to maximize the
expressive possibilities for writing Chinese.

IDS (Ideographic Description Character), that I mentioned above, is a way
that Unicode has to *describe* missing ideographs combining two or more
existing ideographs, by means of a set of positional operators. However, IDS
is explicitly designed to "describe" characters: it is normally the human
reader who has to do the effort of "decoding" the sequence and build a
mental image of the resulting ideograph.
</explaining>

<excusing>
The problem exists only for scholars studying archaic texts, and for few
other people who need unusual ideographs. The great majority of Chinese,
Japanese and Korean users only need a relatively small and well-known subset
of the existing ideograph.
To fulfill the needs of everyday usage, it makes sense to use the simpler
approach one character = one ideograph.
</excusing>

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT