Re: Joining Arabic Letters

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Sat, 31 Mar 2012 08:55:28 +0200

A test table for all Arabic characters that have defined joining types
(and most characters that are not joining) can be seen on this page:

http://en.wikipedia.org/wiki/Template:Arabic_alphabet_shapes/joining

This table is sorted by joining type, then by joining group.

You'll note that some characters that are normatively dual-joining do
not exhibit sometimes the mandatory joining with many fonts, notably
for characters that have been added more recently. What is more
strange is that the same fonts exhibit the left-joining not the right
joining, even though they are normatively dual joining (you can ignore
the letters that are not supported and are just displayed as squares,
and for which you'll see just a small non connecting tatweel on either
sides).

For now I've not seen any existing Arabic font that exhibit the
correct normative joining behavior for these letters such as U+063D
(the Farsi Yeh with an inverted v above, which is dual-joining like
the Farsi Yeh at U+06CC without the inverted v above, and in the same
joining group; those fonts only map a single non-joining glyph for
U+063D, but behave correctly for U+06CC). This is true even for all
Arabic fonts shipped with Windows 7.

Note: this page is a test page, and there may remain some errors, but
the expected joinings are based directly on the normative joining
types and joining groups defined in Unicode.

My comment was then relevant, even in the case of just one font being used.

Le 31 mars 2012 08:32, Philippe Verdy <verdy_p_at_wanadoo.fr> a écrit :
> I was not speaking about ligatures like lan+alef. But really about the
> contextual forms chosen from base letters (and independantly of the
> diacritics applied to them, except for a few of them that use
> different shapes in some combinations for these contextual joining
> forms and that are encoded distinctly in the UCS to allow exactly a
> difference of these contextual shapes in some joining contexts).
>
> I have never said that the glyphs was mandatory. But the joining
> behavior of each letter (independantly of whever ligatures are applied
> on top of them) must be kept. So in a combination like <LAM,
> diacritic, ALEF>, the joining behavior of each letter must be kept,
> even if there's a mapping to a single glyph for <LAM, diacritic>, that
> has itself no ligature bound with the following ALEF. In that case it
> is perfectly acceptable to use a font for <LAM+diacritic> and another
> for ALEF. The absence of the ligature in the first font will have no
> impact on the readability of the text because the ligature is only
> recommended but not mandatory for the script.
>
> I just want to say that the encoding of a separate diacritic between
> base letters that would otherwise join cleanly if using only one font
> should not prevent each font to use the correct contextual form when
> two fonts are used for each letter, even if these "joins" may not look
> very cleanly connected. Using the non-joining letter forms at font
> boundaries is not acceptable for Arabic.
>
> Le 31 mars 2012 07:52, Asmus Freytag <asmusf_at_ix.netcom.com> a écrit :
>> On 3/30/2012 5:36 PM, Philippe Verdy wrote:
>>>
>>> Le 30 mars 2012 20:08, Julian Bradfield<jcb+unicode_at_inf.ed.ac.uk>  a écrit
>>> :
>>>>
>>>> On 2012-03-30, Andreas Prilop<prilop4321_at_trashmail.net>  wrote:
>>>>>
>>>>> I think a better idea is to have joining glyphs always even for
>>>>> different typefaces. At least the Unicode Standard should say
>>>>> what should happen when Arabic characters of different typefaces
>>>>> follow each other.
>>>>
>>>> How can it? Unicode is about plain text. As soon as you start talking
>>>> about different typefaces, you're out of scope.
>>>
>>> Not really. Even if there is only one typeface involved, the joining
>>>
>>> behavior of Arabic letters is normative and in scope.
>>>
>>>
>>
>> The discussion was about joining about typeface boundaries, which is
>> nonsense, of course.
>>
>> In order to make characters "join", the glyphs for each have to be designed
>> to allow
>> such "joining". In cases where the join results in a ligature, it's patently
>> obvious that you
>> can't have a typeface boundary in the middle of a ligature....
>>
>> Now there's always something that renderers could do to provide fall-back
>> solutions.
>> For example, they could see whether one or the other typeface has the full
>> ligature
>> and arbitrarily move the boundaries of the typeface runs. For a "mandatory"
>> ligature
>> like "lam-alif" that might almost be reasonable. (Just as fallback rendering
>> of diacritics
>> is somewhat reasonable).
>>
>> However, I rather have layout engines that work really well in sensible
>> cases, then tryiing
>> to cover weird situations ("ransom notes"). that don't (or shouldn't) occur
>> in practice.
>>
>> That said, some aspects of script rendering are of course in scope for the
>> Unicode Standard.
>>
>> The natural scope for Unicode derives from character identity.
>>
>> Characters are encoded to represent certain entities in text. For characters
>> that are
>> members of scripts this means that there is an understood relation between
>> character
>> sequences and words (or fragments of words) in a given writing system that
>> is supported
>> by that script.
>>
>> If the lam alif ligature is "mandatory," that tells the user that the
>> character sequence for
>> this is expected to be <lam, alif> with no joiner character between the two
>> characters,
>> nor the use of any dedicated character code for the ligature.
>>
>> The same goes for general joining behavior - for Arabic the default is
>> described in
>> the Standard, so that users know when to add ZWJ or ZWNJ for override.
>>
>> And so on...
>>
>> However, it's out of scope for Unicode to mandate anything about how to
>> treat "defective"
>> font bindings - Julian got that right.
>>
>> A./
Received on Sat Mar 31 2012 - 01:58:21 CDT

This archive was generated by hypermail 2.2.0 : Sat Mar 31 2012 - 01:58:22 CDT