RE: Tamil Sri / Shri

From: Bala (
Date: Mon Nov 05 2007 - 02:30:13 CST

  • Next message: Werner LEMBERG: "Re: logos, symbols, and ligatures"

    Dear James Kass,
    Thank you very much for the detailed reply

    We are in the process of defining the standards for TAMIL (தமிழ்) CHARACTER CODE FOR INFORMATION INTERCHANGE in Sri Lanka. I would like to clarify few things as summary.

    In summary Unicode wanted to encode the Grantha Shri in Tamil following way (image1)
    0BB6 + 0BCD + 0BB0 + 0BC0
    ஶ + ் + ர + ீ

    No more old way is suggested in the future encoding
    0BB8 + 0BCD + 0BB0 + 0BC0

    What will be encoding for image2?

    Since each new sequence added to the Unicode standard will "break" existing data, The Unicode wanted to keep the encoding to the Grantha க்ஷ in Tamil following way.
    0B95 + 0BCD + 0BB7
    க + ் + ஷ
    Again thank you very much for your replays

    Kind regards

    -----Original Message-----
    From: [] On Behalf Of James Kass
    Sent: Monday, November 05, 2007 12:28 PM
    To: 'Unicode Mailing List';
    Subject: Re: Tamil Sri / Shri

    ----- Original Message -----
    From: "Bala" <>

    >> But it appears that one ligature uses RA and the other uses RRA.
    > Feel very uncomfortable when the Tamil letters were taken (ர, ற) to
    > the discussion for the formation of sri/shri. ஶ or ஸ are Grantha
    > letters. This is like two different scripts elements were forming a
    > grammatical form.

    It is my impression that all of the Tamil letters have their counterparts
    in Grantha writing. If this is so, then I wonder if these old-fashioned
    letters are based on the SHA plus the Grantha equivalents of TAMIL

    Perhaps we can all agree that the ISCII model was an unfortunate choice
    from the viewpoint of many modern Tamil users.

    Unicode defines a sequence for the "shrii" glyph in Tamil Unicode text.
    As evidence of more old-fashioned letters like these becomes available,
    new sequences will probably be added to the standard.

    Each new sequence added to the standard will "break" existing data for
    those Tamil users who do not wish to see those old-fashioned letters.
    This is because the "virama model" which came from ISCII requires that
    the Indic script "conjuncts" be formed as part of the default condition
    of text.

    When users want to block formation of "conjuncts", the user must
    enter a special formatting character. (In this case U+200C : ZERO WIDTH
    NON-JOINER.) Since users in the past probably didn't expect that any
    new "conjunct" sequences would be added to the standard, they would
    not have been able to predict where to put any of those U+200C
    characters in their texts.

    Although we may all agree that a choice made in the past was unfortunate,
    we should not "live in the past" and we can not change the past. We must
    live in the present and we may look toward the future.

    There are many Tamil computing professionals who are eminently qualified
    to make good, practical input methods and other software applications.

    It would be helpful if Tamil (and Grantha and Tamil Grantha) scholars could
    make a listing of forms which are needed to represent historic Tamil texts.

    This would serve as a basis for education of those Tamil computing
    professionals, as well as the rest of us. With this knowledge, input method
    programmers could devise solid, workable solutions to ensure that
    old-fashioned letters would not appear if the author of a document does not
    want them to appear, while preserving the option of displaying those
    letters if an author *does* want them to appear.

    With a complete listing, programmers could design input methods to
    contextually and automatically insert the special NON-JOINER formatting
    character, wherever it is needed, based on user preference.

    Without a complete listing, there are bound to be unpleasant surprises.

    Best regards,

    James Kass


    This archive was generated by hypermail 2.1.5 : Mon Nov 05 2007 - 02:32:49 CST