Re: Joining Arabic Letters

From: Asmus Freytag <>
Date: Fri, 30 Mar 2012 22:52:36 -0700

On 3/30/2012 5:36 PM, Philippe Verdy wrote:
> Le 30 mars 2012 20:08, Julian Bradfield<> a écrit :
>> On 2012-03-30, Andreas Prilop<> wrote:
>>> I think a better idea is to have joining glyphs always even for
>>> different typefaces. At least the Unicode Standard should say
>>> what should happen when Arabic characters of different typefaces
>>> follow each other.
>> How can it? Unicode is about plain text. As soon as you start talking
>> about different typefaces, you're out of scope.
> Not really. Even if there is only one typeface involved, the joining
> behavior of Arabic letters is normative and in scope.

The discussion was about joining about typeface boundaries, which is
nonsense, of course.

In order to make characters "join", the glyphs for each have to be
designed to allow
such "joining". In cases where the join results in a ligature, it's
patently obvious that you
can't have a typeface boundary in the middle of a ligature....

Now there's always something that renderers could do to provide
fall-back solutions.
For example, they could see whether one or the other typeface has the
full ligature
and arbitrarily move the boundaries of the typeface runs. For a
"mandatory" ligature
like "lam-alif" that might almost be reasonable. (Just as fallback
rendering of diacritics
is somewhat reasonable).

However, I rather have layout engines that work really well in sensible
cases, then tryiing
to cover weird situations ("ransom notes"). that don't (or shouldn't)
occur in practice.

That said, some aspects of script rendering are of course in scope for
the Unicode Standard.

The natural scope for Unicode derives from character identity.

Characters are encoded to represent certain entities in text. For
characters that are
members of scripts this means that there is an understood relation
between character
sequences and words (or fragments of words) in a given writing system
that is supported
by that script.

If the lam alif ligature is "mandatory," that tells the user that the
character sequence for
this is expected to be <lam, alif> with no joiner character between the
two characters,
nor the use of any dedicated character code for the ligature.

The same goes for general joining behavior - for Arabic the default is
described in
the Standard, so that users know when to add ZWJ or ZWNJ for override.

And so on...

However, it's out of scope for Unicode to mandate anything about how to
treat "defective"
font bindings - Julian got that right.

Received on Sat Mar 31 2012 - 00:58:22 CDT

This archive was generated by hypermail 2.2.0 : Sat Mar 31 2012 - 00:58:24 CDT