Re: Printing and Displaying Dependent Vowels

From: Antoine Leca (
Date: Fri Mar 26 2004 - 15:05:12 EST

  • Next message: Antoine Leca: "Re: Printing and Displaying Dependent Vowels"

    On Friday, March 26, 2004 7:12 PM, Philippe Verdy va escriure:

    > Indic scripts are a bit unique by the fact that they have a syllabic
    > structure decomposed into separate letters with a base consonnant and
    > a "combining" (this is not the proper term for Unicode) vowel
    > modifier after it. This differs from European alphabets (Latin,
    > Greek, Cyrillic) or even from some Asian or African syllabaries
    > (notably Hiragana/Katakana) where these grapheme clusters are (almost
    > always) combining sequences are coded with a base character and
    > diacritics.

    Where exactly is the difference with say IPA?
    And with Vocalized Perso-Arabic?

    (And it is not all Indic scripts: Thai and Lao behave differently)

    > Indic scripts offer several variations here because there are also
    > half-forms for these vowels,

    Please, define "half form for vowel". This is new to me.

    > A sample with Devanagari could be: <अा> (U+0905 LETTER A, U+093E
    > VOWEL SIGN AA) which should normally be presented like the
    > precomposed: <आ> (U+0906 LETTER AA), but which incorrectly displays
    > the dotted circle with the "Mangal" font.

    Mangal has nothing to do with this. What you are seeing and critizing is
    Uniscribe's implementation, fruit of a compromise between performances and
    dealing with special/inusual cases. This case is not clearly specified by
    the Devanagari Open Type specifications, but it appears that the default
    behaviour (considering U+093E as dependent vowel shown in isolation, and
    rendering it with the added circle) has been "elected" here by the
    implemention. In my own implementation of the same specifications, I
    consider this is a perfectly correct and useful sequence (used in India to
    teach the sillabary), so I do not insert the circle and as a result (with
    Mangal) it is shown as you expect.

    > So an author has to make some notational compromizes here. But still,
    > I do think that using NBSP as this empty/null base consonnant before
    > the dependant vowel will create the intended Unicode default grapheme
    > cluster.

    About NBSP: I hope Paul will read my other post (direct to Avarangal) and
    will enhance Uniscribe on this respect, allowing NBSP to behave the same as
    SPO on this respect. I am not sure here (one should look at Unicode 2.0),
    but I seem to record the behaviour with NBSP has been added around 3.0, and
    since Uniscribe has been designed against 2.0...

    > Then it's up to the font or renderer to show the NBSP+vowel
    > cluster properly, without the dotted circle, but it's not a problem
    > of Unicode itself.

    I am reading the Unicode list for quite some time (and sorry Philippe, but I
    speak about time previous to when you came in). I do not know why, but every
    now and then, there are comments from regulars that says "This is not a
    defect of Unicode itself", even when nobody is even thinking such a thing.
    On a psychological point of view, this is quite interesting. ;-)

    > If dotted circles appear before the symbol, or if the symbol is shown
    > with a square box for a missing glyph, it's not the fault of Unicode.

    Again! ;-)

    >> Also, something which is probably very relevant to Avarangal, fact
    >> is the implementation from a major vendor in the field, Microsoft
    >> Uniscribe, does retain the dotted circle (if present in the font; if
    >> not, you would probably get the .missing glyph instead).
    > I'm not sure that UniScribe is the cause of this problem.

    I am pretty sure it is! Because if he were using Freetype, he would not have
    any problem to display the standalone glyph. :-D

    Something more complex would be to have some way to display *various*
    representation of the dependent vowels; in Tamil U+0BC1 and U+0BC2, which
    come to mind, show too much variation, there is not likely to have that one
    glyph in the font. But for the well-known Burmese AA U+102C or in
    Traditional Malayalam U+0D41 and U+0D42 this might be an open question.
    ‍Here again, using Freetype this is perhaps doable, but with some
    "higher-level" engine it would be much more complex. If the need for it
    arises, probably the option would be to define a user-accessible OpenType
    feature (of alternative kind).

    > There just
    > appears to exist no GSUB rule in some fonts like Mangal to handle the
    > case of NBSP followed by a Indic vowel sign or combining character,

    Well, we are quite away from the original subject, but anyway...
    You are missing something important about the Indic OpenType specifications.
    Besides, in fact before, the substitutions and after that the positioning,
    which are encoded as TTO tables GSUB and GPOS, there are two stages called
    "analysing" and then "reordering". Analysing deals mainly with splicing the
    stream into clusters. Reordering then does a number of operations, and this
    is this step that will insert the dotted circle. Or will not, depending how
    it is programmed.

    > I'm not an expert of UniScribe programming, but there may exist some
    > Indic features in Indic fonts, which can be enabled in UniScribe to
    > change the rendering behavior by including some additional (optional)
    > GSUB/GPOS tables found in the OpenType font, to the rendering process.

    I doubt that a "degenerate" sequence such as U+093F following anything a
    Devanagari consonant, which would be reordered as U+093F U+25CC, is ever
    passed through the Shaping and Placing steps in Uniscribe (but only Paul
    could confirm this). In my case it does (because I handle the dotted circle
    as a consonant), but that is an artefact of the implementation, this can
    change. I did not look at what does Eric. And anyway I do not know of any
    font that defines some feature for that.

    > What I can see in the Mangal font shipped with Windows XP for example
    > is that it contains many OpenType features for the Devanagari sript
    > in the default language system: nukt, akhn, rphf, blwf, half, vatu,
    > pres, abvs, blws, psts, haln, abvm, blwm.
    > There's not much details about what these feature IDs mean on the
    > help link that is provided in the font properties help.

    Take a look at
    (Almost) everything is explained there. Then you could join the VOLT list to
    have access to even more material at the respect.


    This archive was generated by hypermail 2.1.5 : Fri Mar 26 2004 - 16:57:10 EST