RE: RTL PUA?

From: Peter Constable <petercon_at_microsoft.com>
Date: Thu, 25 Aug 2011 02:39:45 +0000

From: unicode-bounce_at_unicode.org [mailto:unicode-bounce_at_unicode.org] On Behalf Of Philippe Verdy

> Lookup tables in fonts (at least OpenType) do not work at the character
> level, but at the glyph level: they substitute glyph ids by other glyph ids.

That much is true.

> Sequences of glyph ids are already reordered in visual order by the layout
> engine when they are searched in OpenType lookups, should they be RTL
> glyphs, or Indic glyphs with special reordering requirements (independant
> of the logical ordering of characters/code points).

OpenType lookup tables are agnostic wrt LTR or RTL; sequences of glyphs IDs in a lookup are from start to finish. For Indic scripts, some re-orderings are assumed to have been applied before lookups are processed. As for bidi, it is _not_ the case that a glyph sequence in a lookup table is ordered in LTR visual order, as Philippe's statement suggests. Rather, they are ordered from start to finish. One might choose to perceive that in LTR/RTL terms; you certainly don't have to, though which way you perceive it will have to correlate with whether you think of an implementation as actually having done some level reordering before OpenType Layout tables are processed--which certainly is not mandatory for implementations.

> The only lookup table in fonts that work at the character/code point level is
> their "cmap"

Note that the 'cmap' is not typically referred to as a "lookup" table since there is a distinct set of data structures in OpenType that are formally called "Lookup" tables.

> Not all fonts need a "cmap"; for some of them, a default cmap may be implied
> or automatically constructed -- for example Symbol fonts in Windows, that are
> implicitly mapped in a PUA range;

Not true. All OpenType fonts require a cmap table. This is true even of "symbol" encoded fonts. Strictly speaking, symbol-encoded fonts are not encoded using Unicode, and so are not mapped in a PUA range. It is true, though, that they use 16-bit code points and that in many symbol-encoded fonts the code point range used does have numerical values that correlate to those of Unicode PUA characters in the BMP. But years ago Bob Hallissy and I confirmed that symbol-encoded fonts could work with code points in other numerical ranges.

Peter
Received on Wed Aug 24 2011 - 21:42:28 CDT

This archive was generated by hypermail 2.2.0 : Wed Aug 24 2011 - 21:42:38 CDT