Re: Representative glyphs for combining kannada signs

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Mar 22 2006 - 05:12:40 CST

  • Next message: James Kass: "Re: Representative glyphs for combining kannada signs"

    From: "Peter Constable" <petercon@microsoft.com>
    >> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
    >> Dr. John M. Durdin has shown that Lao does not genuinely need OTL
    >> in order to display legibly. If the combining marks which sometimes
    >> can stack higher over other combining marks are in the font at the
    >> higher positions as default, then the rendering is reasonable.
    >
    > Both Thai and Lao *can* be displayed in that manner without smart-font
    > support -- giving a typewriter-like appearance. That not the level of
    > expectation we should be setting, however. It would be like saying that
    > English speakers can only work with fixed-pitch text on a computer.

    I don't think that the Saysettha OT font looks like typewriter style. It has multiple glyphs for several internal variant sizes and positions of diacritics or base letters, and they are chosen using only substitution/positioning basic TrueType tables, and reordering issues (at code point level) can also be achieved using glyph IDs within substitution tables (however I wonder what is the limitation of the subst tables in TrueType engines, in terms of recursive substitutions).

    A OpenType engine would perform glyph substitution faster using much less complex feature rules in fonts and internal rules specific for a script, but this does not mean that these rules can't be coded in classic subst/posn TrueType tables. And the two types of tables could be used: if the script-specific feature tables are there, the engine will benefit from it, and will be able to convert codepoints to glyph IDs and rearrange them according to standard rules, but it should not conflict with the substitution rules performed finally at the core font level (whose entry point is the list of codepoints, as shown in its cmap, not the final glyph ids used in internal substitution tables).

    Indic scripts are not so complex to encode given that they normally work with clusters that are encoded in a normalized order:
    * optional: {{ra OR la OR sa} consonnant #0} + halant (only if not followed by independant vowel)
    * {main consonnant #1} OR {independant vowel (considered as empty consonnant)}
    * optional: nukta (consonnant modifier for loan words)
    * optional: halant + {{ra OR la} consonnant #2}
    * optional: vowel sign (otherwize it's the implied vowel, shown as a attached danda in most Devanagari full consonnants)
    * optional: length mark (only for some vowels)
    * optional: candrabindou (vowel modifier)
    * optional: anusvara (nasalisation or final n) OR vocalized {r OR l OR rr OR ll}
    * optional: ZWJ (to select the half-forms of dead consonants without vowel sign only)

    ZWNJ (for full-form of consonants when ligatures are blocked here) is not part of the sequence itself, but acts only a a blocker in the rendering process, so that codepoint is finally ignored in the final list of glyphs.

    There may exist some other cases for some complex consonnant clusters that I have forgotten in the list above, but they should be extremely rare (if not, add the missing position in that list).

    Once you realize that this is the normal way to write those scripts for normal language, the substitution tables to create half-forms, special forms, subjoined forms and ligatures are easy to generate automatically from a set of linguistic rules for that script (the only complex case is the behavior of left-matras that must be possibly decomposed and then moved at the begining of the sequence, or the case of a leading ra-halant when it can become a diacritic on the last consonnant of the cluster, as this may requiring several swap rules applied recursively to perform the "bubble sort", or a large substitution table).

    What you get finally is a list of glyph ids, that can be as perfect as needed by good typography, because substitution rules do not need to use only the input codepoints as the result of the substitution, but only the internal glyph ids (which are not necessarily present in the cmap), so there can exist as many glyph ids as needed to show the various variants, and in many cases, it won't be necessary to create glyphs for ligatures as they can be positioned as if they were diacritics (a ligature glyph would be possibly preferable only for special rendering styles like drawing only the border lines without filling the shapes, but the elimination of artificial borders would better be the job of a graphic application producing special effects on strings).

    This would be possibly result in a "strange" way to draw the complex clusters, where they are splitted on many small graphical subglyphs where hinting at small size could generate bad mutual position if the initial positioning of the component glyphs is not performed with some precaution (I think it would affect only the narrowest font styles, that are anyway not the best to use in documents at small font sizes).



    This archive was generated by hypermail 2.1.5 : Wed Mar 22 2006 - 05:16:21 CST