Re: New property for reordrant dependent vowels reordering?

From: Richard Wordingham (
Date: Sun Sep 04 2005 - 15:25:23 CDT

  • Next message: Richard Wordingham: "Dai Lanna Script (was: New property for reordrant dependent vowels reordering?)"

    Eric Muller wrote:

    > Peter Constable wrote:

    >>>(1) With the Mangal font, TTA + VIRAMA + TTHA + I should be two
    >>>syllables because of the font's conjunct repertoire, i.e. as the virama
    >>>be visible, the vowel should attach to the TTHA. This special case
    >>>(obvious to a human)

    >>Is *not* so obvious to a human. In fact, there are two writing
    >>conventions that you will find in use in the event of a consonant
    >>cluster where C1 has no half form and a conjunct ligature is not used.
    >>One is to place I before the killed consonant, the other placing the I
    >>after the killed consonant / before the live consonant.

    > Indeed. See Rupert Snell, "Beginner's Hindi Script", p 82, in particular:

    > The standardizing authorities have also condemned such clear and
    > elegant conjunct format as <dya> for dya (preferring <da virama ya>)
    > and <ddha> (preferring <da virama dha>); and they even recommended
    > that an i-sign in a syllable such as ddhi should fall _between_ the
    > two components of the conjunct, giving <da virama i ddha> instead of
    > the well-established <i ddha>! The resulting <...da virama i
    > ddha...> buddhimani ("wisdom") is a strange and inelegant form of
    > <...i ddha ...>; but fortunately, Hindi's wiser public ignores such
    > official recommendations and sticks to the old forms that have
    > served so well for centuries.

    That is not the convention that I described as obviously wrong to a human.
    For this latter cluster what I considered obviously wrong would be <i da
    virama dha>. Antoine Leca recently reported ( , 'Re: 28th
    IUC paper - Tamil Unicode New', ​23 August 2005 08:51):

    'Then, it put the whole idea at the mercy of the correctness of the initial
    analysis of the engine writers. For example, we had a discussion several
    months ago about Devanagari, in the (rare) case where the resulting base
    glyph of the conjunct does not contain all the consonants (the typical
    example were TT.TTH.I ṭṭha <U+091F, U+094D, U+0920, U+093F> ट्ठि, when the
    ट्ठ is lacking in the font): we find printings where the rendering looks
    like ट्‌ठि, but were not able to find the other possibility (with the
    i-matra at the extreme left), even if it is what is specified and
    implemented in the OpenType rendering engines...'

    >> One can always force the I to go after the killed consonant by
    >>inserting ZWNJ; e.g., < TTA, VIRAMA, ZWNJ, TTHA, I >.

    No - that gives ट्‌ठि (tta virama i ttha). Perhaps you (Peter) meant to say
    'live'. <TTA, I, VIRAMA, ZWNJ, TTHA> can give टि्‌ठ (i tta virama ttha)
    independent of font - under Windows XP it works for me in Notepad and
    Outlook Express but with Word 2002 the dashed circle appears. (The Word
    problem may be because it's running an older Uniscribe - 1.0405.2416.1 -
    which I ought to disable somehow.)

    > It seems to me that you are going beyond what is currently spelled out
    > by Unicode. The discussion about the joiners does not mention how they
    > interact with vowel signs. Rule R15 says "when the dependent vowel I is
    > used ... it is always written to the extreme left of the orthographic
    > syllable." But we do not have a definition of orthographic cluster, and
    > the discussion of ZWNJ does not say something like "ZWNJ terminates an
    > orthographic cluster", it only describes the effect on the adjoining
    > consonants.

    Worse! We do have a definition in Section 9.1, Consonant Conjuncts:

    'The Indic scripts are noted for a large number of consonant conjunct forms
    that serve as orthographic abbreviations (ligatures) of two or more adjacent
    letterforms. This abbreviation takes place only in the context of a
    consonant cluster. An orthographic consonant cluster is defined as a
    sequence of characters that represents one or more dead consonants (denoted
    Cd) followed by a normal, live consonant letter (denoted Cl).'

    Thus something that, like Uniscribe, renders final <RA, VIRAMA, TA, VIRAMA>
    as र्त् (ta repha virama) rather than र्‌त् (ra virama ta virama) is wrong.
    Shome mishtake surely? I think the definition needs a mention of visible
    viramas, but the correction is complicated unless Peter is wrong.


    This archive was generated by hypermail 2.1.5 : Sun Sep 04 2005 - 15:31:22 CDT