Sinhala conventions

From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Mon Oct 25 2004 - 11:42:12 CST

  • Next message: Karljurgen Feuerherm: "Arabic IM/font"

    Sorry if you receive this twice: I posted it in the Indic list (appropriate
    AFAIK) but copied the general list since experts not reading the first might
    help. Please answer only on the Indic list to avoid more duplicates; thanks
    in advance.

    Following a recent thread, I am trying to understand the minutes of the June
    meeting. I read there

        [99-C37] Consensus: The UTC recommends that "right-side" forms
        of conjuncts in Sinhala be represented by a sequence of <zwj,
        virama, consonant>. [L2/04-131]

    L2/04-131 itself is forbidden for me to get with
    http://www.unicode.org/cgi-bin/GetMatchingDocs.pl?L2/04-131, but it exists
    an equivalent copy publicly available at
    http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2737.pdf (I guess it is the same
    because the latter says explicitely "L2/04-131" ;-)). This is a committee
    draft, released for public comments in 2004-04-15, of revision 2 of SLS 1134
    (the encoding of Sinhala, a Sri Lanka standard).

    I am very interested to learn about the "zwj,vir,cons" sequence, and not
    only because I spent a few hours end of July to analyse this very sequence
    (in response to http://www.unicode.org/review/pr-37.pdf), while it appears
    from the minutes that a few week before a decision was taken in the
    committee to bring this very sequence into general use, but for yet another
    use...

    What is really "interesting" (so I think) is that this sequence
    (zwj,vir,cons, really 200D 0DCA) does not appear in the said document;
    neither is the expression "right-side"... So what is happening here?

    A bit of context is probably needed here, so I address anyone to re-read
    Michael's http://www.evertype.com/standards/si/iso10646-to-sls1134.html
    (thanks Michael!), written in 1997 (so anything there should be taken with a
    pinch of salt, particularly the use of the joiners) which described, around
    the end, the problems with conjuncts in Sinhala (script).

    If I read correctly:

     -- the usual case, i.e. in Sinhala language (Elu), is to use explicit
    virama (al-lakuna, 0DCA); it is BUD-DHO in Michael's example; it does not
    need any joiner (<0DB6, 0DD4, 0DAF, 0DCA, 0DB0, 0DDC>);

     -- when a ligature conjunct, Brahmi's style, is requested, ZWJ/200D is put
    _after_ the virama; this also happens for rakaransaya (subjoined ra),
    yansaja (post-base ya) and repaya (similar to Nagari's repha), common in
    Sinhala; to stay with Michael's exmaple, this one is BU-DDHO, and would be
    encoded <0DB6, 0DD4, 0DAF, 0DCA, 200D, 0DB0, 0DDC>.

    Till there, I believe it is exactly what spells L2/04-131 / N2737
    (particularly 5.6 to 5.8).

    If we study Michael's document, we can understand that the so-called Pali
    "kerned" conjuncts are not adressed, BU-[DDH]O.

    So my educated guess (helped by documents recently made available in Sri
    Lanka) is that the cons/200D/0DCA/cons sequence is used to encode these
    "kerned" conjuncts or "touching letters". As a result it ought to be encoded
    <0DB6, 0DD4, 0DAF, 200D, 0DCA, 0DB0, 0DDC>.

    Can someone confirm this?

    Also, can someone confirm that what is described here is actually what will
    put in SLS 1134 rev. 2? (or the best approximation of)

    Antoine



    This archive was generated by hypermail 2.1.5 : Mon Oct 25 2004 - 11:56:09 CST