From: Philippe Verdy (email@example.com)
Date: Fri Mar 26 2004 - 13:12:01 EST
From: "Antoine Leca" <Antoine10646@leca-marti.org>
> It seems many are thinking about the section in 2.10, titled "Spacing Clones
> of European Diacritical Marks". I read it as applying to diacritical marks
> (and perhaps only European ones, but the distinction looks like blurry to
> me). Beginning of 2.10 makes quite clear that diacritics is only one class
> (the most important, though) of combining characters. Indic dependent vowels
> are another.
I answered to you by saying "diacritics or vowel signs", but yes it also
includes dependant vowels when they are used to create what is more generally
called "default grapheme clusters" which is a larger set than the set of
"combining sequences" (made of a base character followed by combining
Indic scripts are a bit unique by the fact that they have a syllabic structure
decomposed into separate letters with a base consonnant and a "combining" (this
is not the proper term for Unicode) vowel modifier after it. This differs from
European alphabets (Latin, Greek, Cyrillic) or even from some Asian or African
syllabaries (notably Hiragana/Katakana) where these grapheme clusters are
(almost always) combining sequences are coded with a base character and
But if one wants to show the isolated form of of a Indic vowel, there's a
orthographic convention to use a sort of "vowel order", i.e. a default
consonnant, in a way which also happens in the Arabic and Hebrew scripts for the
default base vowel coded with a base letter.
Indic scripts offer several variations here because there are also half-forms
for these vowels, which are not meant to be used isolately but to complement a
preceding syllable in the same grapheme cluster. It's hard to say which one of
these forms an author would like to present for these isolated dependant vowels
because, as their name suggest, they are normally dependant of another preceding
So the best way to represent these isolated dependant vowels would be to encode
an empty/null base consonnant to force the presentation of the dependant vowel.
An indic text would more probably use one base consonnant and present all
dependant vowels with that consonnant. Trying to represent the isolated vowel
creates a theorical grapheme cluster, which is normally not part of the normal
orthograph of Indic-written words where these vowels are used.
Another solution would be to code these Indic dependant vowels after the Indic
letter A (for example after U+0905 DEVANAGARI LETTER A), because this letter
represents also the default vowel implied by all other consonnants.
A sample with Devanagari could be: <अा> (U+0905 LETTER A, U+093E VOWEL SIGN AA)
which should normally be presented like the precomposed: <आ> (U+0906 LETTER AA),
but which incorrectly displays the dotted circle with the "Mangal" font.
So an author has to make some notational compromizes here. But still, I do think
that using NBSP as this empty/null base consonnant before the dependant vowel
will create the intended Unicode default grapheme cluster. Then it's up to the
font or renderer to show the NBSP+vowel cluster properly, without the dotted
circle, but it's not a problem of Unicode itself.
With NBSP, you get this result: < ा> (U+00A0 NBSP, U+093E VOWEL SIGN AA)
which often shows a square, probably because many fonts don't have a glyph for
the isolated form of the vowel sign.
It is true that this looks like a problem because the dotted circle should not
appear here after showing the NBSP character (because it creates a single
grapheme cluster that should be recognized as such, even if this cluster
contains two combining sequences as it contains two base characters), but the
problem is in the Mangal font itself (or in the UniScribe engine in Windows),
not in Unicode.
In fact you could as well wonder how to represent an isolated form of other
Indic combining characters like an anusvara or candrabindu, but here also
Unicode specifies that they should be coded after a space or preferably a NBSP:
< > (NBSP), < ं> (NBSP, ANUSVARA), < ँ> (NBSP, CANDRABINDU), < ः> (NBSP,
If dotted circles appear before the symbol, or if the symbol is shown with a
square box for a missing glyph, it's not the fault of Unicode. So the best way
would be to use a "normal" Indic base character, such as in:
<अ> (LETTER A), <अं> (LETTER A, ANUSVARA), <अँ> (LETTER A, CANDRABINDU), <अः>
(LETTER A, VISARGA)
where the sequences look more familiar with the "normal" Devanagari orthographic
and calligraphic rendering rules implemented in usual fonts.
> Also, something which is probably very relevant to Avarangal, fact is the
> implementation from a major vendor in the field, Microsoft Uniscribe, does
> retain the dotted circle (if present in the font; if not, you would probably
> get the .missing glyph instead).
I'm not sure that UniScribe is the cause of this problem. There just appears to
exist no GSUB rule in some fonts like Mangal to handle the case of NBSP followed
by a Indic vowel sign or combining character, to map them to a single glyph
without the default dotted circle, so UniScribe renders the glyphs it can find
for the separate codepoints without detecting a "ligature" in that font which
would have allowed to omit this dotted circle.
I'm not an expert of UniScribe programming, but there may exist some Indic
features in Indic fonts, which can be enabled in UniScribe to change the
rendering behavior by including some additional (optional) GSUB/GPOS tables
found in the OpenType font, to the rendering process.
What I can see in the Mangal font shipped with Windows XP for example is that it
contains many OpenType features for the Devanagari sript in the default language
system: nukt, akhn, rphf, blwf, half, vatu, pres, abvs, blws, psts, haln, abvm,
There's not much details about what these feature IDs mean on the help link that
is provided in the font properties help. But some tools may exist to explain
what they mean and how they are enabled, and if there's a reference repository
of these optional "features" for use in applications.
What I know about them is that they allow changing the rendering with optional
substitution and positioning tables (from codepoints to glyph IDs), for nuktas,
half forms, and alternate presentation forms for the R vowel sign.
But simple applications that don't enable these features by default will not use
these additional tables, and so will use a "reasonnable" default rendering. May
be one of these features need to be enabled explicitly by applications to remove
these dotted circles (this probably requires specific GUI options in
applications like text editors or word processors to record their use in the
rich-text document, but I don't know how to enable these features in rich text
formats like XHTML, even with CSS stylesheets).
What is clear is that there's no way to enable these features explicitly in
plain-text files, if there's no standard format control in Unicode to enable
these OpenType font features. May be these could become new "characters" to
allocate in plane 14?
This archive was generated by hypermail 2.1.5 : Fri Mar 26 2004 - 14:01:40 EST