Re: kurdish sorani

From: Behnam (behnam.rassi@gmail.com)
Date: Tue Aug 29 2006 - 16:18:44 CDT

  • Next message: Richard Wordingham: "Re: Unicode characters List instead of hexadecimal equivalent"

    On 28-Aug-06, at 9:27 PM, Andries Brouwer wrote:

    > On Mon, Aug 28, 2006 at 04:18:20PM -0400, Behnam Rassi wrote:
    >> Hi,
    >>
    >> I agree with John Hudson. Kurdish E can be achieved by U+06D5
    >
    > Yes. But then what is Kurdish H?
    >
    >> The other problem is with the definition of Arabic Heh itself and not
    >> any particular local. Arabic Heh is an exception in that it has five
    >> forms. The fifth form is 'abbreviated form' which is a non-joining
    >> character used for abbreviation and enumeration.
    >> Worse, this form is wrongly presented in Unicode PDF files as the
    >> representative of Arabic letter Heh, which indeed should be the oval
    >> form.
    >> If the fifth form gets its own code, it may solve the problem in
    >> Kurdish and many other languages as well.
    >
    > But Unicode does not encode shapes but semantics.
    > So if two languages each have a Heh, but the shaping behaviour
    > differs,
    > then in principle different code points are required.
    > That is why there is U+06CC next to U+064A (and U+0649).

    The use of shapes, particularly in 'heh' family, amongst different
    languages of Arabic script is very fluid and interchangeable. What is
    defined as medial and final forms of heh goal for Urdu language, can
    easily be used in Persian or Arabic for that matter. Some believe it
    is a calligraphic choice of font maker and I believe it should be an
    optional choice of user in encoding. But this is another story. The
    point I want to make is, in searching an answer for your question as
    'what is Kurdish heh', one should be certain that the shapes of
    initial, medial and final forms are not just a matter of optional
    taste, but irrevocable rules.
    If this is clarified, then yes, I agree with you that Kurdish heh
    requires its own code.

    > If I understand you correctly, your fifth form of Heh is
    > the isolated form that now is commonly represented using
    > U+0647,U+200D ?

    Yes, which means it is encoded differently from U+0647 anyway so why
    not having its own code? And why showing it as representative of
    letter heh in Unicode PDF?
    This is a practical demonstration of irrevocable rule in Arabic and
    Persian languages that this shape is never used within a sentence and
    only as an isolated non-joining form for abbreviation and
    enumeration. The initial form is used only as a similar shape but it
    has a totally different contextual behavior.
    The problem with 'U+0647,U+200D' is that it produces visible initial
    form and not the real shape of heh dochashme isolated. To rectify
    that, I put a substitution glyph for this combination in my fonts.
    Fonts that don't have this substitution produce an initial shape
    which is calligraphically incorrect and technically, it is still a
    combination that joins to its left, and abbreviated form (heh
    dochashme isolated) shouldn't.
    Incidentally, as it was mentioned in this thread, the four forms
    behavior of Urdu letter heh dochashme is disputed. If it is
    established that this is a right only joining letter, then it can
    more easily be used for abbreviated form. At least it would be a much
    better option than 'U+0647,U+200D'.

    Behnam



    This archive was generated by hypermail 2.1.5 : Tue Aug 29 2006 - 16:23:20 CDT