Re: RTL PUA?

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Mon, 22 Aug 2011 20:58:23 +0200

2011/8/22 Shriramana Sharma <samjnaa_at_gmail.com>:
> On 08/22/2011 09:00 PM, Philippe Verdy wrote:
>
>>> The font tables themselves contain only ASCII characters I  presume.
>>
>> No. The lookup tables contain sequences of numeric glyph ids (16 bit
>> integers in TrueType and OpenType). Which are also not the code point
>> values, and not the character names or glyph names.
>
> And numeric glyph IDs are still ASCII aren't they? I was just noting that
> the glyph tables themselves don't *use* the actual codepoints of the
> characters getting ligated (while they *refer* to them).
>
>> Let's say that;
>> - the LAMED character is cmap'ped (by its code point value in an cmap
>> for Unicode, or by its code position in a cmap for another legacy
>> 8-bit encoding) to the glyph id 1012,
>> - and the ALEF character is cmapped to the glyph id 1001 (the values
>> of glyph ids are not important, not even their relative order or
>> differences, they don't need to obey any standard),
>> - and the ALEF-LAMED ligature is in glyph id 1540 (the ALEF-LAMED
>> character of the UCS may also be cmapped separately, but this is not a
>> requirement)
>>
>> Then the lookup to perform the ligature will contain : (1012, 1001) ->
>>  (1540).
>
> No! See Behdad's post -- it is clearly said that the lookup will still be in
> logical order (1001, 1012) -> (1540) and not in visual order as you say.
> See? This is what I meant in the other mail by you suggesting that the
> tables containing the characters in visual order and not in logical order,
> to which you replied (without much real explanation I'm afraid):
>
> <quote>No ! I've not "imagined" that. You incorrectly reinterpret
> imaginatively another incorrect imaginative reinterpretation, made by
> someone else, of what I wrote, which did not even suggest that.</quote>
>
>> Glyph id's are presented and scanned in the lookup table, in sequences
>> preordered in visual order by the text layout/shaping engine.
>
> Nope -- they are placed in the lookup table in *logical* order. IIUC the
> entire sequence of glyphs is only reordered from RTL at the very end. Peter
> or Behdad, can you corroborate this?

Hmmm... this is not very clear then in the OpenType specification.
Well it does not matter the which order is physically used in the
stored table as long as it is consistant.

But this confirms that the OpenType rendering algorithm, the way it is
presented in the OpenType specification, is completely wrong: the Bidi
algorithm is definitely not the first step needed before performing
glyph substitutions.

However the Bidi algorithm really needs to reorder the glyphs at least
relatively, for correct application of GPOS (glyph positionining). As
a consequence, the font to use will be completely known (all
"cmap'pings" will have been applied already, and no glyph substitution
can accur across distinct fonts that have independant glyph ids). As
such the PUA agreement implied by the PUA font would have been
asserted. Nothing forbids then to use the font as THE reliable source
of information about which PUAs are RTL and which ones are LTR.

The computing order of features should not then be:
 - BiDi algorithm for reordering grapheme clusters
 - font search and font fallback (using cmap)
 - GSUB (lookups of ligatures or discretionary glyph variants)
 - GPOS
but really:
 - font lookup and font fallback (using cmap)
 - GSUB (lookups of ligatures or discretionary glyph variants)
 - BiDi algorithm for reordering glyphs representing the grapheme
clusters or ligatured grapheme clusters
 - GPOS

The BiDi algorithm absolutely does not have to be changed. This time
there's absolutely no PUA with unknown directionality if the font
defines the RTL property for these PUA (using the normative LTR only
as a default when the font does not specify it)
Received on Mon Aug 22 2011 - 14:01:28 CDT

This archive was generated by hypermail 2.2.0 : Mon Aug 22 2011 - 14:01:33 CDT