Re: Arabic aleph representation of glyphs

From: Jonathan Coxhead (jonathan@doves.demon.co.uk)
Date: Mon Mar 29 2010 - 23:11:39 CST

  • Next message: Asmus Freytag: "Re: Arabic aleph representation of glyphs"

        I know nothing of arabic, but the behaviour you describe on a manual
    typewriter of typing the combining character tanween-al-fatah prior to
    the aliph, sounds like a red herring. Compare with a French typewriter
    with dead keys for accents, where you would type the acute accent first
    (with no carriage advance), then the base letter (eg, "e" to type
    "cafe'"). In the character stream that's generated, the sequence must
    still be "latin small letter e" then "combining acute accent" (or its
    canonical equivalent), even though you typed them in the other
    order.This is the business of the keyboard driver.

        Unicode's design philosophy is that the combining character always
    follows the base character in memory, however that has to be arranged.

        Apologies if this is off-point. Cheers

    Jonathan Coxhead
    Foster City CA 94404

    On 2010-03-28 2:03 pm, CE Whitehead wrote:
    >
    >
    > Hi!
    > I still have questions about line-breaking and collation for the
    > tanween-al-fatah (Unicode ً)* when seated on the aliph (Unicode
    > ا)*
    > (* the tanween al-fatah -- in an Arabic word -- can only sit on/be
    > combined with the aliph or the tah marbutah at the end of the word;
    > it sits above and slightly -- in an rtl context -- to the right of the
    > aliph, above and -- in an ltr context -- slightly to the left of the
    > tah-marbutah).
    >
    > The tanween-al-fatah is classified by unicode as a non-starter
    > character -- a combining mark, as far as I can tell.
    >
    > However, I had the opinion that it was traditionally typed (on the old
    > manual typewriters)
    > prior to the aliph.
    >
    > (1), I did read
    > (in http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries)
    > that ". . . a single combining mark is a (degenerate) combining
    > character sequence"
    >
    > There is also something called 'prepending' (what is that? does it
    > apply?).
    > I am wondering if it would be possible to map
    > a tanween-'al-fatah preceding an aliph at word's end as an irregular
    > sequence?? So that it can have a compatibility mapping to aliph
    > followed by tanween-al-fatah?
    > (I am wondering otherwise how it works with Line Breaking Rule 9 [LB9
    > in http://www.unicode.org/reports/tr14/proposed.html#BreakingRules]:
    > "LB9 Do not break a combining character sequence; treat it as if it
    > has the line breaking class of the base character in all of the
    > following rules.
    > . . .
    > "At any possible break opportunity between CM and a following
    > character, CM behaves as if it had the type of its base character.")
    >
    >
    > (2), I do think however that the tanween-'al-fatah should be sorted
    > the same as -- and generally matched to in a search -- the aliph -
    > tanween-al-fatah sequence
    > that is it needs to be re-ordered somehow even though the
    > tanween-al-fatah is a non-starter character
    > in order for texts to sort properly
    > (However I think searching should generally in Arabic match consonants
    > with or without diacritics and tanween-'al-fatah is just a diacritic
    > -- but I don't do searches in Arabic generally.)
    >
    > (3), Regarding security I don't see a terrible problem though I had
    > trouble viewing all the characters at:
    > http://www.unicode.org/reports/tr36/idn-chars.html
    > (However, I can't read the characters
    > it seems tanween-'al-fatah is only allowed above the tah-marbutah
    > and with the aliph?
    > It seems vowels are disallowed except in remapped compatibility
    > -- which I thought were to be shunned --
    > this means that tanween-'al-fatah by itself is disallowed.)
    >
    > Because the tanween 'al-fatah apparently only occurs in the remapped
    > compatibility with the aliph or tah-marbutah apparently, there should
    > not be security issues related to the fact that
    > it displays about the same whether it sits on the aliph or the
    > preceding character;
    > although it would be nice if addresses with remapped compatibility
    > diacritics were bundled with addresses
    > with straight consonants/seats only.
    >
    > (NOTE: I also checked out the discussion online in Arabic -- that you
    > all pointed to sometime back -- on tanween-al-fatah when seated on the
    > 'alif --
    > if anyone wants to help translate/explain it
    > (it was at:
    > http://www.ahlalhdeeth.com/vb/sendmessage.php)
    > but I can no longer access the link:
    > مثلاً: (بيتًا) أم (بيتاً)؟
    > example: bayt-a-n
    > or
    > bayt-a-n ? (however the placing of the tanween 'al-fatah -- the two
    > little slashes above the word -- is slightly different
    > ولكن الأكثرين على أن التنوين يوضع على ما قبل الألف
    > . . . However/but the greater about ? that the tanween sits on not
    > before the alif
    > ولذلك لا تجد في تحقيقات القدماء من المحققين إلا وضعها قبل الألف
    > And because of this she/? is not found upon the qadma'
    > ancients/veterans' investigation of the truth except that its seat
    > its seat was before/in front of the alif (in a right to left context)
    > الذي يحسم الخلاف في نظري
    > which severs/terminates incongruity on/about speculation/theory.
    > أن وضعها على الألف يجرّد الحرق السابق من الشكل، مع أنه أحق بالشكل
    > that its seat is on the alif severs/strips { next word is a typo;
    > should be 'al-h.arf -- 'character;' not 'al-h.arq, 'burning/rubbing
    > together') the previous letter from the figure,
    > although it is more entitled to the figure.
    > ولكن بعضهم يشكل الحرف السابق أيضا فيضع فتحة عليه
    > However after them it forms the previous character also so that the
    > fatah sits upon it.
    > { ? does this mean that the tanween-'al-fatah sits upon both the
    > preceding character and the aliph simultaneously??? })
    >
    > Best,
    > C. E. Whitehead
    > cewcathar@hotmail.com <mailto:cewcathar@hotmail.com>





    This archive was generated by hypermail 2.1.5 : Mon Mar 29 2010 - 23:17:15 CST