From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Aug 22 2005 - 17:20:11 CDT
From: "Richard Wordingham" <richard.wordingham@ntlworld.com>
> Adam Twardoch wrote:
>
>> Richard Wordingham wrote:
>>
>>> By the way, why can't font-encoded Tamil (e.g. using ASCII codes as a 
>>> hack) display be handled on Windows by a GSUB table that handles the 
>>> re-ordering? Or would that make it Level-2 anyway?  Where can I find a 
>>> definition of 'Level-2'?
>>
>> GSUB tables don't handle the reordering in Indic languages. It's the 
>> responsibility of the OpenType Layout processor, e.g. Uniscribe.
How can an OpenType Layout processor correctly reorder glyphs, when all it 
knows from a font is the binding of single (but whole) codepoints to glyphs, 
and this does not work for characters that have composite glyphs that must 
be reordered separately, and that don't have individual codepoints assigned 
to each part?
To work reliably, it would mean that the fonts have to be specially marked 
so that the glyphs associated to each part are assigned predictable PUA 
codepoints where they can be found in the font's codepoint-to-glyph table.
This suggests a OpenType "feature" to do that, that defines the necessary 
character part-to-glyph ID mappings (most probably these IDs would be PUAs 
in Unicode-compatible fonts), and a standard to encode those parts with 
known semantics for reordering (so that matras will be displayed properly 
after the the first reordering level of Halant and Ra+H). With that feature, 
the original string would have the "composite" characters decomposed into 
their PUA part, and then only the necessary reordering can occur within the 
range of characters in strings that are part of the same script for which 
the renderer maintains the private agreement needed to support the PUAs 
described in the feature table.
Without such feature, the font will just look like a collection of glyphs 
whose only a part are mapped to single codepoints, and other glyphs that are 
bound to no codepoints (such as the Tamil pre-base/post-base parts of 
matras).
Well, it's difficult to find the docs now: the website www.opentype.org no 
longer points to the specs, but rather redirects to a single page hosted by 
Monotype Imaging. And I can't find now the necessay OpenType docs on the 
Microsoft.com/typography site, it only speaks about TrueType, and the 
TrueType specs are now focusing VOLT, and does not say a lot about OpenType 
features. When I look at the section related to Tamil, the most important 
pages are now linking to redirected 404 "blank" pages, so the specs are now 
incomplete...
Is the OpenType "standard" dead?
In a old version of the spec I have, OpenType fonts for Tamil had to support 
the following GSUB features to work with Uniscribe reordering engine:
(1) the Language-based forms:
  'akhn' to substitute akhand ligatures
  'half' to substiture half-forms (pre-base forms)
(2) the conjuncts & typographical forms:
  'pres' for pre-base substitutions
  'abvs' for above-base substitutions
  'blws' for below-base substitutions
  'psts' for post-base substitutions
(3) the halant forms:
  'haln' for halant form substitutions
The problem with these features is that they are tables mapping strings of 
glyph IDs, but how can we compute the glyph IDs that represent pre-base and 
post-base half-matras (or the two halves of KSSA), and that are needed 
before looking up those tables?
Also this model of reordering is not really working with reordered glyphs; 
instead it successively creates ligatures, which will be painted as a single 
glyph, this may make difficult to perform sub-cluster selection (for example 
selecting the whole matra without the base consonnant), or to give it 
distinct color. For those things, we need distinct codes for each part, and 
we need eventually to give them contextual but still distinct/separated 
shapes that effectively need to be ordered, unlike the substitutions above 
that have no order given that they produce a single glyph ID for their input 
pairs or triplets of glyph IDs...
A refined model would encode special features that give the effective 
semantics of glyph IDs before the GSUB features are applied (alternatively, 
the composed glyph ID generated by OpenType GSUB tables could be decomposed 
with a final ordered 1-to-N substitution feature, but this still requires to 
feed the input text with glyph IDs for half matras...).
So all these leave me very perplex about the portability of fonts across 
systems (and I feat the Uniscribe only works reliably by detecting a few 
fonts made or accepted by Microsoft only, and whose names would be 
internally hardwired in Uniscribe). This may explain why some scripts can't 
work on all versions of Windows, and that fonts working with Uniscribe are 
severely tied to Uniscribe's implementation (or even worse, its version...).
This archive was generated by hypermail 2.1.5 : Mon Aug 22 2005 - 17:21:09 CDT