Re: kurdish sorani

From: Philippe Verdy (
Date: Wed Aug 30 2006 - 10:53:21 CDT

  • Next message: Philippe Verdy: "Re: kurdish sorani"

    From: "Behnam" <>
    > The point I want to make is, in searching an answer for your question as
    > 'what is Kurdish heh', one should be certain that the shapes of
    > initial, medial and final forms are not just a matter of optional
    > taste, but irrevocable rules.
    > If this is clarified, then yes, I agree with you that Kurdish heh
    > requires its own code.

    If Kurdish has a clear orthographic distinction between E and H, then this is not a calligraphic choice, and there's no way one can mix the 4 forms of the Arabic Heh that could break the distinction between E and H.
    If one must encode separately the letter for Kurdish H (which will use only two forms of the Heh), one must also encode Kurdisk E so that it will never collide with H due to calligraphic conventions.

    So this looks like both letters must be made clearly distinct, independantly of the font used. This can be done in several ways, but may be the cleanest way would be by adding some combining character to qualify the letter when this is known to create collisions with Arabic usage.

    But then there's the case of Urdu and Uighur. How many letters will we need?

    Why not encoding new format controls that override the joining type of any letter encoded before it, and only this letter (independantly of its left or right context). ZWNJ and ZWJ do not correctly play this role because it affects the joining behavior of letters on its both sides. What is needed is:

    * a ZERO-WIDTH BEFORE-JOINER (that forces the previous RTL character to adopt a right-joining form)
    * a ZERO-WIDTH NON BEFORE-JOINER (that forces the previous RTL character to adopt a right-disjoining form)
    * a ZERO-WIDTH AFTER-JOINER (that forces the next RTL character to adopt a left-joining form)
    * a ZERO-WIDTH NON AFTER-JOINER (that forces the next RTL character to adopt a left-disjoining form)

    And then integrate them in the BiDi and joining rendering rules. Depending in situation, we would encode them after the letter (one of the two first controls), or before the letter (one of the two last controls), and so we would completely control the joining type for renderers (note that this could be integrated in the renderer itself, without needing to change/upgrade existing fonts, or in the fonts themselves if the renderer is not changed) and the encoded pairs would give a clear semantic as well to make the necessary distinctions.

    In this case, only U+0647 is needed, and we don't need to look for new code points for specific letters, and it becomes possible to assign keystrokes directly to these pairs for Kurdish, Urdu, Uighur, ... who knows.

    This archive was generated by hypermail 2.1.5 : Wed Aug 30 2006 - 10:55:03 CDT