Extending the semantics of ZWJ and ZWNJ for Indic scripts

From: Andy White (Andy__White@btinternet.com)
Date: Mon Nov 18 2002 - 18:41:59 EST

  • Next message: Stefan Persson: "Re: The result of the Plane 14 tag characters review"

    A version of this message with some graphics is available here
    http://www.exnet.btinternet.co.uk/KhandaWeb/extending.htm

    The function of ZWJ and ZWNJ in regard to Indic scripts is to alter the
    shaping of a preceding consonant+Virama but in some Indic scripts
    (Bengali Oriya & possibly Traditional Malayalam) a device to control the
    shaping of a proceeding Virama+consonant may be desired. Examples of
    this are found when the first consonant is the letter Ra.

    For example, the sequence, Ra+Virama+Ya may be rendered as reph+Ya or
    Ra+Ya.secondaryForm. This secondary form may or may not legate with the
    preceding character.

    For the purposes of this discussion, I will use Bengali examples, as
    that is what I know best.

    In Bengali, the secondary form of Ya is called jophola (AKA zofola,
    jofola, japhala, & yaphala).

    Jophola is often used to transcribe sounds foreign to Bengali.

    For example to write the English word 'rat', one would use the sequence
    Ra+Virama+Ya+VowelSignA+TTa and would expect it to be rendered as
    Ra+Jophola+VowelSignA+TTa. The question is, how would a rendering device
    know which form of Ra+Virama+Ya was intended (Ya+reph or Ra+Jophola)?

    My thoughts were to put a ZWNJ after the Ra to indicate that is not to
    form a Reph
    e.g. Ra+ZWNJ+Virama+Ya = Ra+Jophola

    Then I remembered that in some font designs, secondary forms such as
    jophola can form a conjunct ligature with the preceding consonant.
    However, I think that a ZWNJ would imply that Ra and Ya should not
    legate.

    My second Idea is to use the primary semantic of ZWJ as used in
    non-Indic scripts e.g.
    In the sequence, Ra+Virama+ZWJ+Ya the ZWJ would imply that Virama and Ya
    should combine to make the secondary form.

    This rule would only apply if the 'base' consonant were the letter Ra.
    (This rule would not apply to devanagri as: 1. it is not an affected
    script and 2. For devanagri, Ra+Virama+ZWJ = eyelash_Ra)

    Andy



    This archive was generated by hypermail 2.1.5 : Mon Nov 18 2002 - 19:19:52 EST