Re: RTL PUA?

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Sun, 21 Aug 2011 21:06:49 +0200

2011/8/21 Peter Constable <petercon_at_microsoft.com>:
> In the OpenType specification, the only data related to glyph mirroring that a rendering engine is assumed to have is the bidi mirroring data from TUS 5.1. (See http://www.microsoft.com/typography/otspec/TTOCHAP1.htm#ltrrtl.) All other glyph mirroring is to be handled using glyph substitution data in OpenType Layout tables in fonts.

In addition, this specification highly depends on two things:
- the layout engine fully knows the properties of all characters in
order to implement BiDi reordering as well as BiDi mirroring
- the layout engine fully knows the necessary mappings for the OMPL
table (this assumes that it always implements the latest version of
the UCD)

This is not the case because:
- an OpenType layout engine will always implement a specific version
of the UCD. Standard properties defined in the UCD will never concern
unassigned characters that will be assigned in a later version. As
well, it will not provide any normative property for the PUA. All it
can then do is then to apply "default" properties for unassigned
(still unknown) characters, as well as for all PUAs.
- as such it will never be able to assert which runs of text
containing PUAs or unassigned characters are in RTL order of LTR
order.
- if it uses the default LTR order, it will not be able to find any
mirroring mapping in the OMPL, because the OMPL lookup table will only
be searched for runs tht have been identified as RTL
- if it uses the default RTL order assumed from some blocks, the OMPL
will still not work with unknown characters/code points (the OMPL only
contains a list of pairs of known (assigned) non-PUA characters), so
character-level mirroring will not work as expected.
- in addition, if it cannot know if a run of reordered characters is
LTR or RTL, after mapping them to the glyph id's from the cmap (where
it exists in a font for the unknown non-PUA character or the PUA
character), it won't know which of the "ltrm" or "rtlm" tables to use
(if it assumes incorrectly the default LTR order, which is the default
for PUA, it will only lookup in the "ltrm" table, not on the "rtlm"
table. Mirroring will then not work if the RTL or RTL guess was wrong.

The only way to change this would be that the OpenType layout engine
allows overriding its default properties for unassigned or PUA
characters. For the case of BiDi reordering, this would require the
support of an additional lookup table in the OpenType font, containing
overrides for the BiDi character class assigned to characters. Of
course, this lookup table should NEVER be used if the character is
non-PUA and known in the implementation of the UCD by the layout
engine. The rule would be:
- if the character is not a PUA and is known in the current
implemented version of the UCD, use the known character property of
the UCD (allow no override).
- otherwise if the character (which is then either a PUA or an unknown
non-PUA) is mapped in the font's "cmap" table, and there's a "BiDi"
lookup table on the OpenType font, and that lookup table provides the
proerty value for that character, use that property
- otherwise use the default property value (indicated in the UCD and
Unicode specifications).

A similar rule can be used as well for the character-level mirroring:
the standard OMPL will be used if and only if the character is not a
PUA and is known in the impelemtned version of the UCD. Otherwise, an
"OMPL" table in the OpenType font will contain additional character
pairs to lookup. Such lookup will however never be performed if the
character is in a LTR run (which means that this feature is dependant
on the correct implementation of the BiDi override above, which must
be impelmented first).

Then only, the existing "ltrm" and "rtlm" lookup tables in OpenType
can be used like today, because the OpenType layout engine knows
reliably which one to use. This allows standard glyph-level mirroring
to be specified (between pairs of glyph-id's).

Also the existing "ltra" and "rtla" lookup tables will be workable to
provide lists of alternate mirrored glyphs (but only for advanced
applications that allows selecting alternate variants). It may be
possible that this first requires the support of additional variation
sequences (using variation selectors), which are unknonw in the
implemented version of the UCD, using an additional lookup table
working under the same rule as above, in order to allow sequences of
PUA+VSn (which will never be part of the UCD, but may be needed under
the PUA convention agreement that the font provides).

One difficulty in this scheme is that all those properties in OpenType
were never meant to be overridable in specific fonts. This means that
they were assumed to be consistant across all fonts. The difficulty
can come because of the behavior of font subsitutions. I don't think
this is critical because this also means that we change of PUA
agreement in this case: the encoded PUA text is then dependant of the
PUA font used to render it.

For the case of unknown (currently unassigned) characters, this causes
no problem at all, provided that fonts are built with overrides needed
to support an higher version of the UCD without conflicting values.

The beauty of this solution, where fonts may override the default
properties of unknown characters, is that the OpenType layout will no
longer absolutely need to be updated to support newer versions of the
UCD and newer scripts, because it will be possible to add the missing
properties in compatible fonts instead.

And it also allows experimentation for newer scripts, or any custom
PUA agreement to work and be rendered correctly (the PUA agreement is
transported by the font, and a document can specify which font to use
to be legible).

This is generalisable to all other character properties, not just BiDi
and mirroring.
Received on Sun Aug 21 2011 - 14:09:59 CDT

This archive was generated by hypermail 2.2.0 : Sun Aug 21 2011 - 14:10:00 CDT