Re: Chairless/Amphibious hamza

From: arno (
Date: Fri Dec 21 2007 - 03:47:47 CST

  • Next message: Jukka K. Korpela: "Re: CLDR Usage of Gregorian Calendar Era Terms: BC and AD -- Can we please have "CE" and "BCE" ?"

    John Hudson wrote:
    > arno wrote:
    >> I simply place hamza above a tatweel. From a printers point of view
    >> there is no problem.
    > Tatweel, as an encoded character, is a hack inherited from metal
    > typesetting technology. It is the elevation of an elongation of a letter
    > into the status of a separate character. It is another example of a
    > confusion of roles between the encoding of text and the display of text.
    > ... There is the one hamza letter U+0621,
    > which most fonts and layout engines treat as disjoining. This is
    > the letter that Tom and I consider amphibious according to the grammar
    > of the script (which is not the same thing as the grammar of a
    > particular language). When the hamza occurs between two letters that
    > would normally join, those letters should join and the hamza should
    > float above between them. When the hamza occurs between two letters that
    > would not normally join, it sits between them. It is a single letter
    > that behaves as a floating mark in one context and as a spacing
    > character in another context.

    Not quite. It is a floating mark that lengthens the connection between
    the letter before and the letter after -- or in other words: it is a
    mark above (or below) tatwîl.
    I agree with you, that storing "amphib Hamza" instead of "tatweel +
    hamza above" OR "tatweel + hamza below" is more elegant, but much more
    difficult to interpret.

    > Since the distinction is one of shaping and positioning, determined by
    > the shaping behaviour of adjacent letters, I believe that this is
    > properly addressed as a display issue and not as an encoding issue.

    You state this as a fact.
    But I wrote earlier, that I disagree.
    Only when you have a fully vowel text and a clearly defined locale,
    adjacent chars will determine the position of hamza sufficiently.

    > That
    > is, I do not believe a character-level distinction exists or should
    > exist between the hamza between two joining letters and the hamza
    > between two non-joining ketters. The distinction is in the display.

    I do not believe that a character-level distinction exists or should
    exist between "fa with a dot above" and "fa with a dot below" or between
      "kaf with three dots above" and "keheh with three dots above." Locale
    should handle the proper choice of glyph.
    But in the absence of an application (??) that does the trick, I can
    (with a font like arabTypesetting or Scheherazade) store text that will
    show the proper behaviour -- search, sort and spell check are not amused
    about the extra work, but they cope.

    > Those different rules simply mean that we can't expect one font to
    > satisfy all users, but there is nothing unusual in that.

    No, I want *one* font for writing an Ottoman and Q24 mushaf,
    and allows me to write words according to Egyptian AND according to
    Syrian rules, and most users want a font capable of writing the proper
    hamza in an basically unvowelled context.

    >> As stated before: Please show where hamza is disjoining and not
    >> disjoined by surrounding?
    > This is the behaviour of every Arabic font on my system (and I have a
    > lot of them), except those that Tom Milo and Mirjam Somers made to work
    > with the DecoType ACE layout engine. I'm not saying that hamza should be
    > disjoining: I'm saying that this is what it does as implemented in most
    > typesetting solutions. It should be disjoined by its surroundings, but
    > instead it disjoins those surroundings. And millions of Arab readers
    > around the world must be living with this on a daily basis in newspapers
    > and magazines and books and advertising and text on television and
    > definitely on the Internet, and very few people seem to be conscious
    > that this is incorrect.

    I beg to disagreed. If there were millions of Arab readers suffering
    from technology, they would complain, and maybe would solve the problem
    just as Pakistanis came up with correct Nastaliq fonts.

    In normal Arabic, there is no amphib hamza.
    Or can you show me Arab manuscripts from the time before the crimes
    committed by technology that show a non-disjoined hamza without chair?

    As far as I know -- but I would enjoy to learn more --, there was no
    chairless hamza as first letter before 1924 and hamza above tatwîl
    occurs only in some copies of the qur'ân.

    > The sample sentences that I used in my online
    > illustration of the issue were taken from, if I recall correctly, the
    > BBC Arabic news; and unless most Arab readers are using fonts with which
    > I am unfamiliar they are seeing the same disjoining hamza as I am seeing.

    I have not seen sample sentences, but just sample words, please instruct
    me where I can find more.

    >>> and Unicode would encode it appropriately. As it is, Unicode has
    >>> inherited a typesetting model that is at odds with the script
    >>> tradition in a number of ways,
    >> wayS -- which ones? (Maybe you could send me that material off the list.)
    > The material is the fonts themselves.

    Sorry, you do not understand what I want to say.
    "fonts themselves" are logically incapable of showing that the Unicode's
    "typesetting model" is at odds with the script tradition.
    You would have to establish rules of the script tradition AND show that
    they are at odd with the fonts.
    I am afraid that the script traditions are far to complex for you,
    Thomas and me to grasp.

    > In this regard there are two
    > kinds: those that were originally designed for previous typesetting
    > technologies, including e.g. hot metal Linotype composition, and which
    > have subsequently been converted or adapted to new generations of
    > technologies and now to Opentype; and those that are new designs but
    > which follow glyph sets determined by the existing fonts.
    > In both, one can see certain inherited mechanisms such as the idea of
    > representing the graphemic structure of the script via ligature glyphs,
    > which are a mechanism dating to metal typography, and which cannot
    > accurately represent all of the normative shapes in the grammar of the
    > individual script styles.

    Nonsense. They can. Probably it is not the most efficient and the most
    elegant, not the most appropriate way t do it, but ligatures can do it.

    > Monotype famously had a nas'taliq font that
    > included more than 20,000 ligature glyphs, and they still couldn't
    > correctly display every possible combination of letters in a word segment.

    You participated in the making of the excellent ArabTypesetting font.
    I am afraid to say that it contains more than hundred IMpossible
    ligatures. Do you know of a Latin font that has kerning pairs
    éß öñ žú ßT ? I guess not, but arabTypesetting has lots of ligature
    where the first letter occurs only in Kurdish and the second one only in
    Urdu and even ligature of the same letter first in the Urdu tradition
    and next in the Sindhi tradition.
    Even some of the Unicode presentation forms are not used. If one does
    not included all theoretically possible ligatures, but only those
    existing in the real word, it is quite possible.

    > And the King Fuad edition of 1924 itself displays limitations of the
    > typesetting technology used, which also relied upon ligature glyphs and,
    > hence, was able to accurately represent only some of the naskh script on
    > which the type style was based. For instance, according to the grammar
    > of the maskh script, the medial form of jim-shape letters always joins
    > from the top, as is seen consistently in the exemplars of scribes and
    > calligraphers;

    not true! The Princely Printing House that produced the King Fuad
    edition had many more ligatures (jîm/ha/xa ligatures among them), but
    they freely choose not to use them.
    You should not confuse a courtly Ottoman writing tradition with the
    rules of the arabic script. If you have only been to Egypt, Syria and
    Turkey you might get the impression that there is one best practice and
    poor peasant styles. But if your view is a bit larger, you will discover
    that Thomas Milo is wrong on most points. (Not wrong in analyzing the
    books in front of him, but in assuming that he is analyzing THE SCRIPT

    > I favour this solution:
    > Leave the combining mark characters as they are; they seem to function
    > fine and the decompositions are straightforward.
    > Leave the Kazakh high hamza as it is and ignore it completely for Arabic
    > language text.
    > Treat U+0621 as an amphibious character at the display level.

    > I don't *like* having to handle the joining behaviour of letters
    > adjacent to hamza contextually in font lookups. Ideally I shouldn't have
    > to. But changing the properties of U+0621 so that adjacent letters are
    > made joining by compliant shaping engines would break a lot of software
    > and pretty much all current fonts. I can't see Unicode doing that, so I
    > think we're obliged to look for solutions at the display level.

    Does that mean that you do not think that ADDING "amphib/chairless
    hamza" is a good idea?

    This archive was generated by hypermail 2.1.5 : Fri Dec 21 2007 - 03:49:46 CST