From: Kent Karlsson (kent.karlsson14@comhem.se)
Date: Sat Sep 03 2005 - 17:20:38 CDT
As yet, Unicode does not have any character properties for
how reordrant (combining) characters reorder. I would suggest
that such a property is introduced. To explain the property
values for this suggested new property, first define "extend
combining sequence".
An extended combining sequence is a combining sequence where
one considers any base character that occurs after a virama
(cc=9; even if other truly combining characters intervene)
as combining.
I would suggest having the following property values, with their
meanings, for the reordrant property:
R0: Non-reordrant. (This is not to be listed explicitly in the
prospective data file, most characters have this value
for the reordrant property).
R1: Move to the left of the preceding combining sequence.
(This is similar to combining class 224 in placement of the
glyph, but these ones are not moved by canonical ordering
of combining marks since they have combining class 0.)
R2: Split; move the left part to the left of the preceding
combining sequence.
R3: Move to the left of the preceding *extended* combining
sequence. (This is for the case where the pre-vowel is
displayed to the left of the entire orthographic syllable.)
R4: Split; move the left part to the left of the preceding
*extended* combining sequence.
The "moves" here are pre-display moves, just like for bidi.
The underlying character sequence is not affected.
It is not yet clear if <super> and <sub> digits should be moved
over as well, which has been suggested.
However, from the examples given, it appears like any <super>
and <sub> digits occur only *after* the full orthographic
(Tamil) syllable. If that is always the case, the rules above
would not be affected.
If instead <super> and <sub> digits do occur inside an orthographic
syllable, as has been suggested, but no evidence yet given, the rules
for reordrant properties R1-R4 (or perhaps just for R1 and R2) would
need to be extended (by, for these rules, considering also <super>
and <sub> digits to be combining) to also indicate a move over
<super> and <sub> digits. Obviously, any <super> and <sub> digits
inside an orthographic syllable would break any ligature/conjunct
formation.
/Kent K
This archive was generated by hypermail 2.1.5 : Sat Sep 03 2005 - 17:33:08 CDT