Re: Unicode for Malayalam Language.

From: Mark Liberman (myl@unagi.cis.upenn.edu)
Date: Tue Oct 21 1997 - 11:18:29 EDT


Glenn Adams wrote:

>Unicode assumes that fonts typically contain
>a larger set of glyphs than the set of characters encoded
>for a particular script. A rendering engine is assumed to
>exist for the purpose of mapping characters to glyphs, selecting
>the appropriate glyph based on lexical context.
[...]
>So, to answer your question, these will not be included in
>future versions of Unicode. That is, unless you can make
>a very strong case for why some glyph cannot be deterministically
>chosen based on simple, non-linguistic context.

Glenn's answer is completely true, but perhaps not truly complete.

As far as I know, no major-vendor unicode-aware software now includes
a general "rendering engine" capable of handling the character
composition needs of scripts with non-trivial composed characters
(where "trivial" composed characters are the ones that work by just
assigning font widths appropriately).

Furthermore, I don't know of any scheduled major-vendor products that
will have such general Unicode rendering capability, or even any
announced development effort along these lines. This is because such
capability is simultaneously very difficult to implement, and of
negligible economic interest, since all scripts of current economic
importance already have all the characters they need incorporated into
Unicode in precomposed form.

As I understand it, to the extent that "rendering engines" are
implemented to handle this aspect of Unicode, it is likely to be on a
script-by-script basis, since the typographical problems tends to vary
from case to case.

I would be very happy to learn that I am wrong about any or all of this.

If I am not wrong, then the speakers of the world's many languages
like Malayam are in a sort of catch-22 situation. The Glenn Adams' of
the world tell them not to ask for their own precomposed characters,
since Unicode is designed with a generative capacity that should make
such characters unnecessary. The Microsofts and Suns of the world tell
them not to expect any software that actually implements this
generative capacity, since the Europeans, East Asians etc. have
already insisted that the characters they need should be incorporated
in precomposed form, and therefore the market for general generative
rendering is small and fragmented.

The conclusion, which I believe is intended by no one but in fact
accepted by all parties, is that Unicode will be of little or no value
for the foreseeable future to speakers of languages like Malayalam,
except perhaps as an interchange format.

A second conclusion is that anyone devising a new script should be
sure that it does not involve any character composition, since the
Unicode consortium is happy to accept characters from any script, no
matter how obscure or irrelevant, as long as there is no way to
generate them from other bits and pieces.

            Regards,

            Mark Liberman



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT