Re: chairless hamza (in reply to John)

From: John Hudson (john@tiro.ca)
Date: Sat Jan 05 2008 - 01:32:38 CST

  • Next message: arno: "Re: chairless hamza (in reply to John)"

    arno wrote:

    > a chairless hamza after a dual joining Arabic letter followed by a
    > joining Arabic letter is ALWAYS either transparent (between lam and
    > alef) or inserts a tatweel like connection between the two letters
    > ALWAYS

    Yes, I understand all of this. I avoid the term tatweel, though, because elongation models
    in Arabic are style-specific and insertion of a horizontal extender is
    technology-specific, i.e. there are styles in which tatweel is inappropriate and there are
    technologies that implement elongation without inserting extender glyphs.

    > = in MSA it is a typo, that's why your fonts do not behave
    > properly, because the designers do not envision the case (whenever
    > somebody write it on the machine, she immediately corrects it);

    I don't understand this section of your message.

    > As far as Arabic is concerned -- and this is of course an important
    > qualification -- all your arguments against modifying the official
    > joining behaviour of chairless hamza are baseless.

    Note careful distinction of character and glyph in the following discussion:

    Let's say we have a typical Arabic font. And we want to display the sequence lam +
    chairless hamza + alif. You suggest that the definition of U+0621 be changed so that it
    will not interrupt the joining of lam + alif, i.e. that it will be transparent (which we
    all agree, I think, is how it should be). What happens in most current fonts is that there
    is a single glyph associated with U+0621, and this *glyph* interrupts the formation of the
    lam+alif combination, which is most often handled as a ligature glyphs (although not in
    the SIL fonts). So one ends up with a disconnected sequence:

            لءا

    This is not merely an issue of the current Unicode properties for this character, but also
      of existing implementations of those properties in fonts and other software. Simply
    changing the Unicode properties doesn't make these implementations go away or magically
    make them work with the new properties.

    So let's say that a layout engine that implements your proposed new properties for U+0621
    encounters a typical Arabic OpenType font that does not. The layout engine treats U+0621
    as transparent, and applies appropriate shaping to the lam and alif. Even so, there is no
    guarantee that the lam+alif ligature in the font will correctly form, because the hamza
    *glyph* interrupts the lookup sequence

            Lam.init Alif.fina -> Lam_Alif

    The sequence of glyphs needs to be present in order for the ligature to form, but you are
    probably going to end up with

            Lam.init Hamza Alif.fina

    In order for the hamza to be ignored in this sequence, it needs to be treated in the font
    as a non-spacing mark -- which is of course exactly what you want it to be in this context
    -- but that means it has to be defined as such in the font GDEF table. Since the font has
    been built around the assumption that U+0621 is a non-joining character, the glyph is not
    defined as a non-spacing mark and is not ignored during ligature formation.

    But let's pretend for a moment that the lam+alif ligature does form correctly: you are
    left with this (typically quite large) hamza glyph that is now reordered after the
    ligature glyph (which is what happens when a ligature forms while ignoring an intermediary
    glyph) and has no information in the font telling the layout engine what to do with this
    glyph: no substitution information that will convert it into a non-spacing mark glyph, no
    positioning information that will locate it correctly relative to the lam+alif.

    Whether chairless hamza is addressed at the new character level (as suggested by Khaled)
    or at the character properties level (as suggested by you) or at the display level (as
    suggested by me), layout engines and fonts are going to need to be updated to take handle
    it correctly. My concern is that the solution should avoid actually breaking existing
    implementations, and the way to avoid this seems to be to not tamper with U+0621. I
    wouldn't mind seeing U+0621 formally deprecated -- i.e. maintained in the standard, but
    not recommended to be used -- and a new character with correct properties and
    implementation guidelines introduced to replace it. That would effectively freeze the
    current implementations for that character, allow them to continue to function as they
    have been, and not introduce new demands for the handling of this character that existing
    layout engines and fonts cannot meet.

    John Hudson

    -- 
    Tiro Typeworks        www.tiro.com
    Gulf Islands, BC      tiro@tiro.com
    The Lord entered her to become a servant.
    The Word entered her to keep silence in her womb.
    The thunder entered her to be quiet.
                 -- St Ephrem the Syrian
    


    This archive was generated by hypermail 2.1.5 : Sat Jan 05 2008 - 01:36:40 CST