It is not discrimination against non-Roman scripts that Unicode is trying
hard to eliminate ligatures and variant glyph forms...it is to their
advantage! Think of how complicated search engines and sorting routines
must be to sort out that redundant confusion! The reason Latin scripts
have those presentation forms is because of legacy 'standards' concerns
that we should be happy many non-Roman scripts can be freed from. Don't
complain...rejoice! On the other hand I do agree that more work should be
done to link language codes with Unicode text so that there is a language
context from which we can interpret the meaning of the character codes.
Similarly Microsoft should give more API support to TrueType Open and
Apple to GX typography so that the intricacies such a ligatures and
direction can be more generally available for any language.
On Sat, 11 Jan 1997 unicode@Unicode.ORG wrote:
> >The current argument on the Arabic ligatures appears very strange to me,
> >and culturally biased. The Unicode standard defines them as presentation
> >forms, and specifies their equivalence to their preferred encoding with
> >basic letters. Thus, they are clearly specified to be equivalent to a
> >sequence of basic characters, and anyone who can render Arabic correctly
> >should be able to render them without difficulty.
> >On the other hand, Unicode and 10646 contain hundreds of pre-composed
> >Latin, Cyrillic and Greek letters, equally superfluous, equally
> >decomposable, and this is acceptable because this is what our Western
> >colleagues are used to.
> I whole-heartedly agree. I haven't looked at Unicode 2.0, but I see
> A acute, a acute, E acute, e acute, I acute, i acute etc., etc., in
> Unicode 1.0, encoded as single characters. There is also the stand-alone
> Acute accent.
> Why is then Devanagari forced to represent its ligatures as multiple
> characters, to be deduced from the character encoding, and with the
> requirement of (paraphrasing Glen Adams' words) of "complex character
> encoding to glyph translation" schemes ?
> If Latin was encoded with the same regard that is given to Devanagari,
> then there would be no A acute character, it would have to be entered
> as <A> + <acute sign>. What glyphs are to be rendered are as easily
> deduced from the character encoding as is anything in Devanagari. And,
> I believe, the way A acute is entered into a tool like TeX is
> as \'A -- essentially, a two-character encoding. Instead, Unicode 1.0
> has a glyph encoding for all of the letters modified with the acute,
> grave, circumflex, etc. etc. signs.
> For any other language, Arabic or Hindi, to have a glyph encoding,
> however, is a no-no, and we are told to consider the allographic
> versus the graphemic, to stop thinking like font designers, etc. etc.
> There is no rationality in this. I hope that these irrationalities in
> the Latin encoding have been removed in Unicode 2.0. Or else, I hope
> the purveyors of graphemic purity have the grace to blush.
> >From :
> " The Unicode standard ends up straying far from the ideal with respect
> to a number of basic policies, all in the interests of devising a
> small enough character set.
> These compromises, moreover, were made not from a neutral standpoint
> but with the linguistic biases of people in the Latin language sphere
> (especially the English language sphere)."
> I am afraid that as more non-Western people become aware of the Unicode
> standard, they are going to be easily convinced of the truth of above
> statement. Right now, the software industry and its standards are in
> the custody of the western nations, but that will not be forever.
> In other parts of the world, scripts even have religious significance;
> I do not think people will take poor representations of them lightly.
> -arun gupta
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT