Re: New property for reordrant dependent vowels reordering?

From: Richard Wordingham (
Date: Sat Sep 03 2005 - 21:38:43 CDT

  • Next message: Kent Karlsson: "RE: New property for reordrant dependent vowels reordering?"

    Kent Karlsson wrote:

    > As yet, Unicode does not have any character properties for
    > how reordrant (combining) characters reorder. I would suggest
    > that such a property is introduced. To explain the property
    > values for this suggested new property, first define "extend
    > combining sequence".
    > An extended combining sequence is a combining sequence where
    > one considers any base character that occurs after a virama
    > (cc=9; even if other truly combining characters intervene)
    > as combining.

    It has become clear from some curious cases, such as Devanagari TTA + VIRAMA
    + TTHA + I with the Mangal font, that the orthographic syllable can depend
    on the font, and not simply on the characters. As there is no ligature for
    TTA and TTHA and no half-form for TTA, this sequence is two orthographic
    syllables - TTA + VIRAMA and TTHA + I. This happens most of the time in
    Tamil, and is a problem Microsoft claim to have faced and solved for

    > I would suggest having the following property values, with their
    > meanings, for the reordrant property:
    > R0: Non-reordrant. (This is not to be listed explicitly in the
    > prospective data file, most characters have this value
    > for the reordrant property).
    > R1: Move to the left of the preceding combining sequence.
    > (This is similar to combining class 224 in placement of the
    > glyph, but these ones are not moved by canonical ordering
    > of combining marks since they have combining class 0.)
    > R2: Split; move the left part to the left of the preceding
    > combining sequence.

    Do you yet have any examples of R1/R2 as opposed to R3/R4?

    > R3: Move to the left of the preceding *extended* combining
    > sequence. (This is for the case where the pre-vowel is
    > displayed to the left of the entire orthographic syllable.)
    > R4: Split; move the left part to the left of the preceding
    > *extended* combining sequence.

    For Malayalam, some of the splitting forms are only optionally splitting!
    Your proposal requires that splitting and non-splitting forms be different

    > The "moves" here are pre-display moves, just like for bidi.
    > The underlying character sequence is not affected.
    > It is not yet clear if <super> and <sub> digits should be moved
    > over as well, which has been suggested.
    > However, from the examples given, it appears like any <super>
    > and <sub> digits occur only *after* the full orthographic
    > (Tamil) syllable. If that is always the case, the rules above
    > would not be affected.

    Not the example Naga gave - 'An on-line example was mentioned here: . Check out the column
    "Tamil with numbered consonants"'. There the AA symbol and AU length mark
    always follow the subscript.

    > If instead <super> and <sub> digits do occur inside an orthographic
    > syllable, as has been suggested, but no evidence yet given,

    WRONG! See above.

    > the rules
    > for reordrant properties R1-R4 (or perhaps just for R1 and R2) would
    > need to be extended (by, for these rules, considering also <super>
    > and <sub> digits to be combining) to also indicate a move over
    > <super> and <sub> digits. Obviously, any <super> and <sub> digits
    > inside an orthographic syllable would break any ligature/conjunct
    > formation.

    I haven't seen any cases where they break consonant-vowel ligatures. I
    don't believe breaking conjuncts is an actual issue for Tamil. However, if
    someone were to do something as bizarre as use Brahmi or Grantha (not in
    Unicode yet) forms with the Tamil repertoire, it would get interesting. In
    theory subscripts shouldn't be any worse than nuktas, and for simple
    conjuncts, as in Brahmi, I think they shouldn't break the conjuncts. After
    all, they wouldn't present any problems when writing by hand.

    It's not totally bizarre to think of using subscripts or superscripts in
    other Indic scripts. After all, to answer one of Sinnathurai Srivas's
    points, 'H' has been subscripted, sometimes by vowel, sometimes by number,
    in Proto-Indo-European reconstructions, and 'r' and 'd' were being
    subscripted when Proto-Austronesian phonemes were proliferating. (I
    actually looked at a paper with a title like 'Yet Another Proto-Austronesian
    Phoneme'.) Why shouldn't numeric subscripts be used in Devanagari for a
    Hindi popularisation?


    This archive was generated by hypermail 2.1.5 : Sat Sep 03 2005 - 21:44:35 CDT