Re: Tamil Sri / Shri

From: James Kass (thunder-bird@earthlink.net)
Date: Mon Nov 05 2007 - 04:49:05 CST

  • Next message: Sinnathurai Srivas: "Re: Tamil Sri / Shri"

    There does not appear to be a "rra" equivalent in Grantha, so
    my speculation about the basis of formation of a possible
    "shRii" ligature vs. the conventional "shrii" based on Grantha
    letters seems unlikely.

    http://www.mudgala.com/articles/grantha.html

    G.Balachandran wrote,

    > We are in the process of defining the standards for TAMIL (தமிழ்)
    > CHARACTER CODE FOR INFORMATION INTERCHANGE in Sri Lanka.
    > I would like to clarify few things as summary.
    >
    > (1)
    > In summary Unicode wanted to encode the Grantha Shri in Tamil
    > following way (image1)
    > 0BB6 + 0BCD + 0BB0 + 0BC0
    > ஶ + ் + ர + ீ
    >
    > No more old way is suggested in the future encoding
    > 0BB8 + 0BCD + 0BB0 + 0BC0
    >
    > (2)
    > What will be encoding for image2?
    >
    > (3)
    > Since each new sequence added to the Unicode standard will "break"
    > existing data, The Unicode wanted to keep the encoding to the
    > Grantha க்ஷ in Tamil following way.
    > 0B95 + 0BCD + 0BB7
    > க + ் + ஷ

    Dear Bala,

    For point (1), I think you are correct.

    (1a) The preferred method of encoding "shrii" is
    0BB6 + 0BCD + 0BB0 + 0BC0 (ஶ + ் + ர + ீ)

    (1b) The old way for "shrii" is not recommended
    for future text encoding, but is supported on Windows
    starting with Windows 2000. Legacy data exists using
    the old way, so the old way may be supported as long
    as there is concern about backward-compatibility.

    (1c) Many operating systems do not support the new way.

    (2) I don't know how or if the difference between image1
    and image2 should be represented in plain text.

    (2) If image2 is only a variant of image1, then the difference
    can not be distinguished in plain text, and would require
    rich text and a font change. (Using a variation sequence
    here does not seem possible under the current language
    of the standard because a variation selector character can
    only be applied to a single base character.)

    (2) If image2 represents a different letter/ligature than
    image1, then the difference should be distinguishable at
    the character/plain text level. Even if these different
    letter/ligatures at some point became conflated with each
    other and from then on became used interchangeably.

    (2) If image2 is a special form of image1 which represents
    the god Luxmi and related concepts, and if image1 does not
    represent the god Luxmi and related concepts, then image2
    could be encoded as a symbol. As a symbol, users may then
    wish to use it in running text in place of image1 wherever
    the users deem appropriate.

    (3) Unicode wanted to keep the encoding of kssa as
    0B95 + 0BCD + 0BB7 instead of adding a new character for
    kssa because kssa can already be expressed in plain text
    using Unicode. Adding a new character which can already
    be expressed in Unicode would, for one thing, increase the
    opportunities for spoofing/"phishing"/internet fraud.

    (3) Please see the page
    http://unicode.org/pending/proposals.html
    for a more detailed explanation. The section under "Proposal
    Guidelines" starting with "Often a proposed character can be
    expressed..."

    When I spoke of "breaking" applications for many users
    when new sequences are added, I meant sequences like:
       TAMIL LETTER SHRII;0BB6 0BCD 0BB0 0BC0
    ...which Peter Constable mentioned as a provisional named
    sequence 2006/07/26 on the public Unicode list.

    The proposal for letter SHA
    ( http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2617.pdf )
    mentions that "... SHA may also form ligatures in combination with
    MA, YA, [R]RA, and VA. However, these ligatures are archaic and
    are not widely recognized. Contemporary publications use only
    disjointed forms."

    Since the SHA character isn't well supported yet, and since
    a combination like 0BB6 + 0BCD (ஶ்) displays on this system
    with a dotted circle, any problem with existing data display
    suddenly changing from contemporary disjointed forms into
    archaic ligatures seems unlikely. (There probably isn't too
    much existing data using the SHA character.)

    But, if archaic ligatures exist for other letter combinations
    (with letters other than SHA), and they become "named
    sequences", then existing data would result in a "broken"
    display.

    I apologize for my lengthy answers. Many of the concepts
    involved are complex. I hope this is truly helpful. I'm also
    sorry that I don't know the answer to point (2).

    Best regards,

    James Kass



    This archive was generated by hypermail 2.1.5 : Mon Nov 05 2007 - 04:52:02 CST