Re: Tamil conjunct consonants (was: Encoding Tamil SRI)

Date: Fri Nov 07 2003 - 05:34:05 EST

  • Next message: Andrew C. West: "Re: elided base character or obliterated character (was: Hebrew composition model, with cantillation marks)"

    Peter Jacobi wrote,

    > So, which codepoint sequence will imply the disjoint form and
    > which will imply the ligated form? If 'Indic unification' still
    > holds, the conjunct form always is the default and the disjoint
    > form needs ZWNJ.
    > IMHO this doesn't fit well actual Tamil use and raises a lot of
    > practical problems.
    > Either there must be an accepted list of these ligatures (but
    > lists of archaic usage tend to grow), or one is bound to put a
    > preemptive ZWNJ after every SHA VIRAMA in modern use, to prevent
    > conjunct consonant forming.
    > If this archaic ligature problems extends to other grantha
    > consonants, even more preemptive ZWNJs are necessary for
    > contempary Tamil.

    The Unicode string U+0BB2, U+0BC8 will display differently, depending
    on which font is used. (லை)

    Code2000 will display an old-fashioned ligature glyph, Latha will
    show a more modern alternative, and TabAvarangal2
    ( )
    will render the string in a proposed Tamil script-reform style.

    Yet, the underlying encoded character string is constant.

    It may be possible and desirable to treat these archaic ligature
    forms similarly. Fonts designed for modern Tamil simply won't
    include these archaic ligature glyphs, so it shouldn't be necessary
    to insert ZWNJs all over the place in existing files.

    Anyone seeking to reproduce a Tamil classic would need to specify
    an appropriate font which includes the archaic ligatures. Users
    whose systems lacked the appropriate font would still be able
    to read the document, however.

    IMHO, it's important to preserve options for users to explicitly
    control ligation in plain text. With these archaic Tamil ligatures,
    an author *may* elect to insert ZWNJs and other appropriate
    formatting characters to preserve such distinctions where

    I'm still concerned about the SHRII ligature encoding, though.
    Of course, it makes sense to treat the ligature as a conjunct
    of SHA + RA + II, but since SA + RA + II seems to have been
    the "official" way to encode the ligature -- the proposed
    change will break existing implementations.

    It might be best to add the new SHA character without changing
    the existing SHRII encoding (SA + RA + II).

    Best regards,

    James Kass

    This archive was generated by hypermail 2.1.5 : Fri Nov 07 2003 - 06:06:51 EST