RE: Encoding Tamil SRI

From: Peter Jacobi (peter_jacobi@gmx.net)
Date: Wed Nov 05 2003 - 23:10:58 EST

  • Next message: Jony Rosenne: "RE: Merging combining classes, was: New contribution N2676"

    Dear Peter Constable, All,

    Thank you for your answer, but unfortunately my question seemed
    to be not clear enough. I'll try to explain better.

    I wrote:
    > > 0x82 : 0x0BB8 0x0BCD 0x0BB0 0x0BC0
    (current mapping for SRI glyph, TSCII codepoint 0x82)

    You wrote:
    > That is exactly how it should be encoded.
    [...]
    > Of course, in order to comment, one would need to know why the above is
    > not satisfactory.

    The point is, that contrary to northern Indian scripts, Tamil doesn't form
    conjunct consonants. So the sequence:
    0x0BB8 0x0BCD 0x0BB0 0x0BC0
    which is
    (TAMIL SA) (TAMIL VIRAMA) (TAMIL RA) (TAMIL VOWEL SIGN II)
    should appear the same as
    (TAMIL SA) (TAMIL VIRAMA) (ZWNJ) (TAMIL RA) (TAMIL VOWEL SIGN II)

    The ZWNJ would be redundant for tamil.

    Compare with
    (TAMIL SA) (TAMIL VIRAMA) (TAMIL RA) (TAMIL VOWEL SIGN *)
    For all other TAMIL VOWEL SIGNs.

    They are all rendered correctly non-conjunct.

    > > Alternatives given were
    > > (0BB8)(0BCD)(0BB1)(0BC0)
    > > (0BB6)(0BCD)(0BB1)(0BC0) (if and when U+0BB6 becomes Unicode)
    > > (0B9A)(0BBF)(0BB1)(0BC0)
    >
    > Alternatives to what?

    As said, the rendering of (0BB8) (0BCD) (0BB0) (0BC0) as
    conjunct glyph SRI is foreign to Tamil. It seems strange, to
    have to insert ZWNJ to use (0BB8) (0BCD) (0BB0) (0BC0)
    in normal Tamil, where ZWNJ is unnecessary for all other
    combinations.

    The problem is especially visible, because (0BB8) (0BCD) (0BB0) (0BC0)
    (rendered as normal Tamil, so it must be currently written
    (0BB8) (0BCD) ZWNJ (0BB0) (0BC0)) is an actual Tamil word.

    The above alternatives were proposed, because they are not in
    use as Tamil words but phonetically similar to SRI. Assigning one
    of these sequences the SRI glyph was proposed as a 'lesser evil'
    measure.

    Given that it is not easy to get a distinct TAMIL SRI character, my
    non-expert
    view is, that the SRI glyph can be mapped to
    (TAMIL SA) (TAMIL VIRAMA) (CGJ) (TAMIL RA) (TAMIL VOWEL SIGN II)

    Then the normal sequence
    (TAMIL SA) (TAMIL VIRAMA) (TAMIL RA) (TAMIL VOWEL SIGN II)

    is recovered for use as normal Tamil word, which now must be written

    (TAMIL SA) (TAMIL VIRAMA) (ZWNJ) (TAMIL RA) (TAMIL VOWEL SIGN II)

    > Alternatives to what? The first and third sequence would have distinct
    > appearances (see attached file),

    This is circular logic. The current display of 0x0BB8 0x0BCD 0x0BB0 0x0BC0
    as SRI is just because the mapping of this sequence to the SRI glyph
    embedded
    in the font's glyph substitution table.

    > and would consistute distinct
    > spellings.

    That's the point. The TAMIL SRI is not spelled
    (TAMIL SA) (TAMIL VIRAMA) (TAMIL RA) (TAMIL VOWEL SIGN II) by
    Tamils.

    Regards,
    Peter Jacobi

    -- 
    NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
    Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService
    Jetzt kostenlos anmelden unter http://www.gmx.net
    +++ GMX - die erste Adresse für Mail, Message, More! +++
    


    This archive was generated by hypermail 2.1.5 : Wed Nov 05 2003 - 23:42:54 EST