Re: Chairless/Amphibious hamza

From: John Hudson (
Date: Fri Dec 21 2007 - 02:38:32 CST

  • Next message: John Hudson: "Re: [OT] Re: CLDR Usage of Gregorian Calendar Era Terms: BC and AD -- Can we please have "CE" and "BCE" ?"

    arno wrote:

    > I simply place hamza above a tatweel. From a printers point of view
    > there is no problem.

    Tatweel, as an encoded character, is a hack inherited from metal typesetting technology.
    It is the elevation of an elongation of a letter into the status of a separate character.
    It is another example of a confusion of roles between the encoding of text and the display
    of text.

    >> The chairless hamza -- what Tom Milo calls the 'amphibious hamza'

    > Let's not mix up things!
    > The "amphibious hamza" does away -- at least for the Arabic language --
    > with all hamzas presently encoded -- or did I get it wrong?

    I don't think I'm mixing things up. There is the one hamza letter U+0621, which most fonts
    and layout engines treat as disjoining. This is the letter that Tom and I consider
    amphibious according to the grammar of the script (which is not the same thing as the
    grammar of a particular language). When the hamza occurs between two letters that would
    normally join, those letters should join and the hamza should float above between them.
    When the hamza occurs between two letters that would not normally join, it sits between
    them. It is a single letter that behaves as a floating mark in one context and as a
    spacing character in another context.

    Since the distinction is one of shaping and positioning, determined by the shaping
    behaviour of adjacent letters, I believe that this is properly addressed as a display
    issue and not as an encoding issue. That is, I do not believe a character-level
    distinction exists or should exist between the hamza between two joining letters and the
    hamza between two non-joining ketters. The distinction is in the display.

    > I hope you are not only thinking about these things, but studying them.
    > So you must have some material about the influence of technology on
    > hamza writing. My impression is that -- speaking for Arabic only -- the
    > joining behaviour of the letters and vowels around the hamza determined
    > its chair, not technology.

    The joining behaviour of the letters and vowels around the hamza *should* determine its
    form and positioning. But the technology has largely determined that hamza has become a
    disjoining character, because that his how the character encoded as U+0621 has behaved in
    most typesetting systems for many years now.

    >> Returning to my point above: I think this can be looked at as a
    >> display issue, in which case the question becomes whether font formats
    >> and layout engines have suitable mechanisms to handle the contextual
    >> behaviour. OpenType does, and I believe Apple's AAT and SIL's Graphite
    >> do also. Tom Milo's ACE technology certainly does, as he has already
    >> implemented this.

    > I doubt it. Since there are different rules on the chair of hamza in
    > different lands and times -- not to speak of non-Arabic languages --
    > it's too hard a task even for Thomas Milo.

    Those different rules simply mean that we can't expect one font to satisfy all users, but
    there is nothing unusual in that. If the rules can be described, then they can be
    implemented in typography. The problem facing Arabic typography is that the rules have not
    been adequately described, and hence the implementations are not guided by the script
    tradition but by reference to previous implementations (metal type, phototype, various
    early digital type formats). And so one ends up with peculiarities like the large set of
    metal ligature forms in the Arabic Presentation Forms block and the tatweel character: the
    elevation of display mechanisms particular to specific technologies into the status of
    encoded characters. And over the past few years I have converted and mastered dozens of
    fonts from previous digital formats into OpenType for various clients, and in all of them
    the chairless hamza (U+0621) is presumed to be a disjoining letter because that is how the
    previous technologies had handled it.

    >> In an ideal world, the amphibious hamza would never have developed a
    >> modern usage as a disjoining character,

    > As stated before: Please show where hamza is disjoining and not
    > disjoined by surrounding?

    This is the behaviour of every Arabic font on my system (and I have a lot of them), except
    those that Tom Milo and Mirjam Somers made to work with the DecoType ACE layout engine.
    I'm not saying that hamza should be disjoining: I'm saying that this is what it does as
    implemented in most typesetting solutions. It should be disjoined by its surroundings, but
    instead it disjoins those surroundings. And millions of Arab readers around the world must
    be living with this on a daily basis in newspapers and magazines and books and advertising
    and text on television and definitely on the Internet, and very few people seem to be
    conscious that this is incorrect. The sample sentences that I used in my online
    illustration of the issue were taken from, if I recall correctly, the BBC Arabic news; and
    unless most Arab readers are using fonts with which I am unfamiliar they are seeing the
    same disjoining hamza as I am seeing.

    >> and Unicode would encode it appropriately. As it is, Unicode has
    >> inherited a typesetting model that is at odds with the script
    >> tradition in a number of ways,

    > wayS -- which ones? (Maybe you could send me that material off the list.)

    The material is the fonts themselves. In this regard there are two kinds: those that were
    originally designed for previous typesetting technologies, including e.g. hot metal
    Linotype composition, and which have subsequently been converted or adapted to new
    generations of technologies and now to Opentype; and those that are new designs but which
    follow glyph sets determined by the existing fonts.

    In both, one can see certain inherited mechanisms such as the idea of representing the
    graphemic structure of the script via ligature glyphs, which are a mechanism dating to
    metal typography, and which cannot accurately represent all of the normative shapes in the
    grammar of the individual script styles. Monotype famously had a nas'taliq font that
    included more than 20,000 ligature glyphs, and they still couldn't correctly display every
    possible combination of letters in a word segment.

    And the King Fuad edition of 1924 itself displays limitations of the typesetting
    technology used, which also relied upon ligature glyphs and, hence, was able to accurately
    represent only some of the naskh script on which the type style was based. For instance,
    according to the grammar of the maskh script, the medial form of jim-shape letters always
    joins from the top, as is seen consistently in the exemplars of scribes and calligraphers;
    indeed, the resulting pattern of three jim-shape letters in succession is particularly
    dramatic and exemplars including it are among the most often reproduced in books on
    Islamic calligraphy. But the typeface of the King Fuad edition, if I recall correctly,
    contains only ligatures for two jim-shapes in succession; so when three jim-shapes are set
    in succession with this type the script system breaks down and two are joined vertically
    while one is joined horizontally.

    > To end constructively: If you want to go for a "chairless hamza"
    > you MUST restrict the allowed behaviour of existing characters.
    > Decomposition between hamza above/below and the four precomposed letters
    > (and high hamza and its three precomposed letters) is fairly
    > straightforward, but chaired and chairless hamza would have to be
    > separated or if that's too difficult, we must go for the ONE hamza.

    I'm not sure what you mean by 'restrict the allowed behaviour of existing characters'.

    I favour this solution:

    Leave the combining mark characters as they are; they seem to function fine and the
    decompositions are straightforward.

    Leave the Kazakh high hamza as it is and ignore it completely for Arabic language text.

    Treat U+0621 as an amphibious character at the display level.

    I don't *like* having to handle the joining behaviour of letters adjacent to hamza
    contextually in font lookups. Ideally I shouldn't have to. But changing the properties of
    U+0621 so that adjacent letters are made joining by compliant shaping engines would break
    a lot of software and pretty much all current fonts. I can't see Unicode doing that, so I
    think we're obliged to look for solutions at the display level.

    John Hudson

    PS. I may not respond to further discussion until after Christmas.

    Tiro Typeworks
    Gulf Islands, BC
    At the sunset of our days on earth, at the moment of
    death, we will be evaluated on the basis of our similarity
    or otherwise with the Baby who is to be born in the poor
    grotto of Bethlehem, since it is He who is the standard
    of measurement which God has given to humanity.
                        -- Benedict XVI

    This archive was generated by hypermail 2.1.5 : Fri Dec 21 2007 - 02:41:16 CST