Re: New property for reordrant dependent vowels reordering?

From: Richard Wordingham (
Date: Sun Sep 04 2005 - 08:07:58 CDT

    Kent Karlsson wrote:

    > Richard Wordingham wrote:
    >> It has become clear from some curious cases, such as
    >> Devanagari TTA + VIRAMA
    >> + TTHA + I with the Mangal font, that the orthographic
    >> syllable can depend
    >> on the font, and not simply on the characters. As there is
    >> no ligature for
    >> TTA and TTHA and no half-form for TTA, this sequence is two
    >> orthographic
    >> syllables - TTA + VIRAMA and TTHA + I.
    > Then there are two orthographic syllables here, per definition.
    > If there in addition is any ligating between adjancent
    > orthographic syllables, then that is a separate issue.
    > Are you claiming that reordering may take place over more
    > than one orthographic syllable? If so, that should be carried
    > in the underlying text somehow. It should not be a font
    > dependence, as this is would clearly be an orthographic difference.

    The points are that:

    (1) With the Mangal font, TTA + VIRAMA + TTHA + I should be two orthographic
    syllables because of the font's conjunct repertoire, i.e. as the virama must
    be visible, the vowel should attach to the TTHA. This special case (obvious
    to a human) was missed in rule R15 in Section 9.1 of the Unicode standard,
    perhaps because the case of no combination was omitted from Figure 9-3.
    Uniscribe follows the standard literally in this case, with result that for
    Mangal one sees the two orthographic syllables TTA + VIRAMA + I (I hope i
    have the order right) and TTHA, which is nonsense.

    (2) On the other hand, with the Code 2000 font, TTA + VIRAMA + TTHA + I is a
    single orthographic syllable. (The TTA.TTHA ligature is shown in the
    Unicode standard in Table 9-2.)

    (3) Devanagari TA + VIRAMA + THA + I should be and is a single orthographic
    syllable in both fonts.

    (4) This has already been discussed by others on the Indic list, so I don't
    think it is down to me to make a formal error report.

    >> Do you yet have any examples of R1/R2 as opposed to R3/R4?
    > R3/R4 would be used for (e.g.) Devanagari; see rule R15 (not to be
    > confused with the property value names I suggested) on page 228 of
    > TUS4. I'm not sure about R1/R2, and I'll leave that to be answered
    > by someone more familiar with the Indic scripts than I am.

    It also applies to the virama-model South Indian scripts I'm acquainted
    with - Tamil, Burmese and Khmer. In theory, it's the lack of conjuncts in
    the font that makes Tamil seem different, but I wouldn't be surprised if a
    renderer only considered KSSA as a possible conjunct in Tamil. (I presume
    the SHRI ligature is handled differently.)

    >> I haven't seen any cases where they break consonant-vowel
    >> ligatures.
    > Some posted scans would be nice...

    Sorry, I only have examples from the Internet :) As I posted this week on
    the Indic list:

    "Examples of OO can be found in verses 8 and 24 at . An example
    of /dadau/ can be seen at the end of the first line in verse 3 at . (Replace
    'tam2' by 'english' for a transliteration into the Roman alphabet.)"

    "However, you will have to explain away texts like
    ttp:// , which has many
    examples of the first ligature, e.g. /pra.naamam/ at the start of the 3rd
    paragraph. Interestingly, this text puts the superscript at the *end* of
    the akshara, e.g. after vowel sign AA. Or is this not actually in the Tamil
    script? I notice that it has visargas."

    >> theory subscripts shouldn't be any worse than nuktas, and for simple
    >> conjuncts, as in Brahmi, I think they shouldn't break the
    >> conjuncts. After
    >> all, they wouldn't present any problems when writing by hand.
    > Where would you then display them? Inside in the middle of the
    > conjunct somewhere? Again, presenting actual existing examples
    > would be nice; if any exist.

    For Brahmi, Khmer and Dai Lanna, the conjuncts are generally not ligated.
    There are a few specific issues, but they go away if one is allowed to
    substitute a superscript for a subscript and vice versa. Automating their
    placement would be more complicated - the ascender of a subscript would have
    to be moved away from the superscript of the base consonant.


