Re: Unicode for Malayalam Language.

From: Kenneth Whistler (
Date: Tue Oct 21 1997 - 15:28:45 EDT

Mark Liberman commented:

> If I am not wrong, then the speakers of the world's many languages
> like Malayam are in a sort of catch-22 situation. The Glenn Adams' of
> the world tell them not to ask for their own precomposed characters,
> since Unicode is designed with a generative capacity that should make
> such characters unnecessary. The Microsofts and Suns of the world tell
> them not to expect any software that actually implements this
> generative capacity, since the Europeans, East Asians etc. have
> already insisted that the characters they need should be incorporated
> in precomposed form, and therefore the market for general generative
> rendering is small and fragmented.

You've already heard from Microsoft, which is busy building correct
rendering engines for Middle Eastern, Southeast Asian, and South
Asian scripts. And you've heard from Apple, which has had such
technology for some time now.

> The conclusion, which I believe is intended by no one but in fact
> accepted by all parties, is that Unicode will be of little or no value
> for the foreseeable future to speakers of languages like Malayalam,
> except perhaps as an interchange format.

I won't repeat the points made by Glenn Adams.

However, I would like to counter the implication that giving in on
insisting on combining marks and "generative rendering" for scripts
such as Malayalam, and instead encoding as characters all the
conjuncts, half-forms, or ligatures required for such scripts
would improve the situation.

Do you really think that tracking down all the conjunct forms used
in Hindi (and Sanskrit and Nepali and Bihari and Jaipuri and...) and
encoding them all as characters in Unicode/10646 would *simplify*
the problem of implementing text handling and rendering of
Devanagari, or bring such systems to market one whit earlier or
in better quality? Those who have experience in implementing such
systems insist otherwise.

A good example can be found in Unicode itself for the Arabic
script. A vast pile of Arabic ligatures were encoded in Unicode/10646
at the insistence of Egypt (see U+FDB3 to U+FDFB). Has the fact of
the existence of these encodings done anything whatsoever to get
Arabic implementations based on Unicode to market faster, cheaper,
or better? On the contrary, Unicode Arabic implementations basically
ignore these ligature "characters", and implement Arabic text correctly,
using the basic Arabic characters in the U+0600 block. The
existence of the Arabic ligature encodings represents an annoying
"compatibility" kluge that has to be dealt with just in case anyone
is foolish enough to actually use them and expects a Unicode-based
Arabic rendering system to handle them correctly. The particular
set of Arabic ligatures actually encoded is of no actual use whatsoever,
since it is, of course, incomplete, ignores stylistic differences,
and does nothing about the multilevel rendering required for
high-level Arabic typography. So the existence of the ligature codes
does nothing to improve the quality of Arabic rendering in the
implementations, and is just a useless suitcase of compatibility
dreck which has to be carried around.

Moral of the story: encoding too much can make things worse and
*delay* implementations that people can actually use. A kneejerk
reaction to claims that "you didn't code all my characters" can
make things worse, rather than better.

--Ken Whistler

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT