Re: New property for reordrant dependent vowels reordering?

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Sep 04 2005 - 23:17:33 CDT

  • Next message: ndlogasundaram: "Mr. Kai Fu-Lee 's work connected with language recognition"

    Kent Karlsson wrote (
    http://www.unicode.org/mail-arch/unicode-ml/y2005-m09/0023.html ):
    > Richard Wordingham wrote:
    >
    >>>> TTA and TTHA and no half-form for TTA, this sequence is two
    >>>> orthographic
    >>>> syllables - TTA + VIRAMA and TTHA + I.
    >
    > There is just one orthographic syllable, the virama
    > ties the consonants together (regardless of whether they form a
    > conjunct/ligature or not).
    >
    > An orthographic syllable here is (simplified):
    >
    > <consonant, {combining marks, at least one of which is a virama}>*
    > <consonant, {maybe combining marks, no virama}>

    This is definitely not true for Burmese, where the 'native' spelling does
    not allow orthographic syllables to straddle phonetic syllable boundaries.
    CVCDV..., where D is an oral stop consonant, is split into three
    orthographic syllables, CV, C+visible virama, DV. The split CV, CDV is a
    mark of foreign (chiefly Pali) origin. As with other Pallava scripts,
    several Burmese matras appear on the far left, even in CCV syllables.

    The formulation above is the sort of thing that makes users complain that
    this use of virama is unnatural. To me, a more natural formulation is:

    <consonant, {combining marks, at least one of which is a conjoiner}>*
    <consonant, {maybe combining marks or visible virama, no conjoiner}>

    Now in most Indic scripts, the conjoiner and virama are encoded the same,
    though not in Khmer, where the visible virama seems to be falling out of
    use. The common encoding is encouraged by the fact that the conjoiner may
    surface as a visible virama in Devanagari.

    > Conjuncts are just ligatures of <consonant, virama, consonant> and
    > are indeed mostly optional (though some are very common) and certainly
    > font dependent (and conjuncts may in turn ligate with a following
    > dependent vowel).
    >
    >> For Brahmi, Khmer and Dai Lanna, the conjuncts are generally not
    > ligated.
    >
    > You mean that the consonants in the orthographic syllables do not
    > generally form conjuncts/ligatures...

    No! I was in general wrong when I said the font determined the conjuncts.
    For most Indian scripts the font does, but it does not for Brahmi, Burmese
    or Khmer and I think not for Tibetan and Dai Lanna (at least, when vowels do
    not interpose). The primitive method of forming conjuncts is just to stack
    the consonants vertically, and this is open to any combination of
    consonants. These scripts have not moved far from this system and the
    Tibetan model of having separately encoded conjunct/subscript forms works.
    In these scripts the subscript forms do not (in general) combine with the
    base (i.e. initial) consonant. Dai Lanna, or at least the Thai variant, has
    a few ligatures. More notably, the Burmese subscript forms do ligate with
    one another. There are also repha-type complications and some subscript
    forms send shoots up to the baseline from which the consonants hang.

    > Peter Constable wrote:
    >
    >> Is *not* so obvious to a human. In fact, there are two writing
    >> conventions that you will find in use in the event of a consonant
    >> cluster where C1 has no half form and a conjunct ligature is not used.
    >> One is to place I before the killed consonant, the other placing the I
    >> after the killed consonant / before the live consonant.
    >
    > That is, as Eric Muller wrote, then two *orthographic* conventions.

    These are not the two Eric Muller spoke of. We are talking of three
    conventions where half-forms are not available. In Devanagari visual order
    they are:

    1) <i da virama dha>
    2) <da virama i dha>
    3) <i d.dha>

    Peter is referring to all three; Eric Muller to forms (2) and (3).

    > These must be *reliably* be distinguished in the underlying text.
    > It must NOT be font dependent (for properly constructed fonts).

    This would be unreasonable if you are referring to (2) v. (3). You would be
    requiring that for each *language* all Devanagari fonts have the same
    language-dependent repertoire of conjuncts.

    >> In Windows Vista, Uniscribe is being updated to support either
    >> convention. The font implementation will determine which is used by
    >> default.

    I don't see any simple mechanism whereby an OpenType font (as opposed, e.g.,
    to an AAT font) can select between (1) and (2).

    >> One can always force the I to go after the killed
    >> consonant by
    >> inserting ZWNJ; e.g., < TTA, VIRAMA, ZWNJ, TTHA, I >.
    >
    > I don't think ZWNJ (in that position) is the appropriate way to
    > distinguish these two orthographic conventions, since that is
    > used for another distinction. But a ZWJ just before the dependent
    > vowel *may* be a possible way to distinguish these two orthographic
    > conventions (font independently!); e.g.:
    >
    > < TTA, VIRAMA, ZWNJ, TTHA, I > -- I to the extreme left, with visible
    > virama

    With Uniscribe, that currently yields <ta virama i ttha>.

    > < TTA, VIRAMA, TTHA, I > -- I to the extreme left, using conjunct (if in
    > font)

    With Uniscribe and Mangal 1.20, that currently yields <i tta virama ttha>.
    In Windows Vista, this is to be overridable, I presume by feature selection.

    > < TTA, VIRAMA, ZWNJ, TTHA, ZWJ, I > -- I before TTHA, with visible
    > virama

    I believe the ZWJ should currently be redundant, and for consistency with
    workable Burmese should remain so. With the Uniscribe I'm using, it
    actually forces a new cluster and thus generates the dotted circle.

    > < TTA, VIRAMA, TTHA, ZWJ, I > -- I before TTHA, with visible virama

    Same Uniscribe problem as above.

    I'm happier with the current Uniscribe schemes:

    <TTA, I, VIRAMA, ZWNJ, TTHA> yields vowel on the left - टि्‍ठ.
    <TTA, VIRAMA, ZWNJ, TTHA, I> yields vowel in the middle - ट्‍ठि.

    Actually, <TTA, I, VIRAMA, TTHA> also yields vowel on the left - टि्ठ. I'm
    not sure that this should be recommended though - it requires that my
    proposed loose definition of an orthographic cluster be amended from

    <consonant, {combining marks, at least one of which is a conjoiner}>*
    <consonant, {maybe combining marks or visible virama, no conjoiner}>

    to

    <consonant, {combining marks, at least one of which is a conjoiner}>*
    <consonant, {combining marks with explicit vowel or visible virama, or
    leaving the implicit vowel unstripped, or no conjoiner.}>

    Although closer to the official definition (which has a problem for
    Burmese), the second one needs further modification to allow independent
    vowels with subscript consonants, as occur in Khmer.

    Richard.



    This archive was generated by hypermail 2.1.5 : Sun Sep 04 2005 - 23:21:55 CDT