From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sat Sep 03 2005 - 21:38:43 CDT
Kent Karlsson wrote:
> As yet, Unicode does not have any character properties for
> how reordrant (combining) characters reorder. I would suggest
> that such a property is introduced. To explain the property
> values for this suggested new property, first define "extend
> combining sequence".
>
> An extended combining sequence is a combining sequence where
> one considers any base character that occurs after a virama
> (cc=9; even if other truly combining characters intervene)
> as combining.
It has become clear from some curious cases, such as Devanagari TTA + VIRAMA
+ TTHA + I with the Mangal font, that the orthographic syllable can depend
on the font, and not simply on the characters. As there is no ligature for
TTA and TTHA and no half-form for TTA, this sequence is two orthographic
syllables - TTA + VIRAMA and TTHA + I. This happens most of the time in
Tamil, and is a problem Microsoft claim to have faced and solved for
Malayalam.
> I would suggest having the following property values, with their
> meanings, for the reordrant property:
>
> R0: Non-reordrant. (This is not to be listed explicitly in the
> prospective data file, most characters have this value
> for the reordrant property).
>
> R1: Move to the left of the preceding combining sequence.
> (This is similar to combining class 224 in placement of the
> glyph, but these ones are not moved by canonical ordering
> of combining marks since they have combining class 0.)
>
> R2: Split; move the left part to the left of the preceding
> combining sequence.
Do you yet have any examples of R1/R2 as opposed to R3/R4?
> R3: Move to the left of the preceding *extended* combining
> sequence. (This is for the case where the pre-vowel is
> displayed to the left of the entire orthographic syllable.)
>
> R4: Split; move the left part to the left of the preceding
> *extended* combining sequence.
For Malayalam, some of the splitting forms are only optionally splitting!
Your proposal requires that splitting and non-splitting forms be different
characters.
> The "moves" here are pre-display moves, just like for bidi.
> The underlying character sequence is not affected.
>
> It is not yet clear if <super> and <sub> digits should be moved
> over as well, which has been suggested.
>
> However, from the examples given, it appears like any <super>
> and <sub> digits occur only *after* the full orthographic
> (Tamil) syllable. If that is always the case, the rules above
> would not be affected.
Not the example Naga gave - 'An on-line example was mentioned here:
http://www.prapatti.com/slokas/slokasbyname.html . Check out the column
"Tamil with numbered consonants"'. There the AA symbol and AU length mark
always follow the subscript.
> If instead <super> and <sub> digits do occur inside an orthographic
> syllable, as has been suggested, but no evidence yet given,
WRONG! See above.
> the rules
> for reordrant properties R1-R4 (or perhaps just for R1 and R2) would
> need to be extended (by, for these rules, considering also <super>
> and <sub> digits to be combining) to also indicate a move over
> <super> and <sub> digits. Obviously, any <super> and <sub> digits
> inside an orthographic syllable would break any ligature/conjunct
> formation.
I haven't seen any cases where they break consonant-vowel ligatures. I
don't believe breaking conjuncts is an actual issue for Tamil. However, if
someone were to do something as bizarre as use Brahmi or Grantha (not in
Unicode yet) forms with the Tamil repertoire, it would get interesting. In
theory subscripts shouldn't be any worse than nuktas, and for simple
conjuncts, as in Brahmi, I think they shouldn't break the conjuncts. After
all, they wouldn't present any problems when writing by hand.
It's not totally bizarre to think of using subscripts or superscripts in
other Indic scripts. After all, to answer one of Sinnathurai Srivas's
points, 'H' has been subscripted, sometimes by vowel, sometimes by number,
in Proto-Indo-European reconstructions, and 'r' and 'd' were being
subscripted when Proto-Austronesian phonemes were proliferating. (I
actually looked at a paper with a title like 'Yet Another Proto-Austronesian
Phoneme'.) Why shouldn't numeric subscripts be used in Devanagari for a
Hindi popularisation?
Richard.
This archive was generated by hypermail 2.1.5 : Sat Sep 03 2005 - 21:44:35 CDT