Re: Chairless/Amphibious hamza

From: John Hudson ([email protected])
Date: Fri Dec 21 2007 - 02:38:32 CST

Next message: John Hudson: "Re: [OT] Re: CLDR Usage of Gregorian Calendar Era Terms: BC and AD -- Can we please have "CE" and "BCE" ?"

Previous message: Jukka K. Korpela: "Re: CLDR Usage of Gregorian Calendar Era Terms: BC and AD -- Can we please have "CE" and "BCE" ?"
In reply to: arno: "Re: Chairless/Amphibious hamza"
Next in thread: arno: "Re: Chairless/Amphibious hamza"
Reply: arno: "Re: Chairless/Amphibious hamza"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

arno wrote:

> I simply place hamza above a tatweel. From a printers point of view
> there is no problem.

Tatweel, as an encoded character, is a hack inherited from metal typesetting technology.
It is the elevation of an elongation of a letter into the status of a separate character.
It is another example of a confusion of roles between the encoding of text and the display
of text.

>> The chairless hamza -- what Tom Milo calls the 'amphibious hamza'

> Let's not mix up things!
> The "amphibious hamza" does away -- at least for the Arabic language --
> with all hamzas presently encoded -- or did I get it wrong?

I don't think I'm mixing things up. There is the one hamza letter U+0621, which most fonts
and layout engines treat as disjoining. This is the letter that Tom and I consider
amphibious according to the grammar of the script (which is not the same thing as the
grammar of a particular language). When the hamza occurs between two letters that would
normally join, those letters should join and the hamza should float above between them.
When the hamza occurs between two letters that would not normally join, it sits between
them. It is a single letter that behaves as a floating mark in one context and as a
spacing character in another context.

Since the distinction is one of shaping and positioning, determined by the shaping
behaviour of adjacent letters, I believe that this is properly addressed as a display
issue and not as an encoding issue. That is, I do not believe a character-level
distinction exists or should exist between the hamza between two joining letters and the
hamza between two non-joining ketters. The distinction is in the display.

> I hope you are not only thinking about these things, but studying them.
> So you must have some material about the influence of technology on
> hamza writing. My impression is that -- speaking for Arabic only -- the
> joining behaviour of the letters and vowels around the hamza determined
> its chair, not technology.

The joining behaviour of the letters and vowels around the hamza *should* determine its
form and positioning. But the technology has largely determined that hamza has become a
disjoining character, because that his how the character encoded as U+0621 has behaved in
most typesetting systems for many years now.

>> Returning to my point above: I think this can be looked at as a
>> display issue, in which case the question becomes whether font formats
>> and layout engines have suitable mechanisms to handle the contextual
>> behaviour. OpenType does, and I believe Apple's AAT and SIL's Graphite
>> do also. Tom Milo's ACE technology certainly does, as he has already
>> implemented this.

> I doubt it. Since there are different rules on the chair of hamza in
> different lands and times -- not to speak of non-Arabic languages --
> it's too hard a task even for Thomas Milo.

Those different rules simply mean that we can't expect one font to satisfy all users, but
there is nothing unusual in that. If the rules can be described, then they can be
implemented in typography. The problem facing Arabic typography is that the rules have not
been adequately described, and hence the implementations are not guided by the script
tradition but by reference to previous implementations (metal type, phototype, various
early digital type formats). And so one ends up with peculiarities like the large set of
metal ligature forms in the Arabic Presentation Forms block and the tatweel character: the
elevation of display mechanisms particular to specific technologies into the status of
encoded characters. And over the past few years I have converted and mastered dozens of
fonts from previous digital formats into OpenType for various clients, and in all of them
the chairless hamza (U+0621) is presumed to be a disjoining letter because that is how the
previous technologies had handled it.

>> In an ideal world, the amphibious hamza would never have developed a
>> modern usage as a disjoining character,

> As stated before: Please show where hamza is disjoining and not
> disjoined by surrounding?

This is the behaviour of every Arabic font on my system (and I have a lot of them), except
those that Tom Milo and Mirjam Somers made to work with the DecoType ACE layout engine.
I'm not saying that hamza should be disjoining: I'm saying that this is what it does as
implemented in most typesetting solutions. It should be disjoined by its surroundings, but
instead it disjoins those surroundings. And millions of Arab readers around the world must
be living with this on a daily basis in newspapers and magazines and books and advertising
and text on television and definitely on the Internet, and very few people seem to be
conscious that this is incorrect. The sample sentences that I used in my online
illustration of the issue were taken from, if I recall correctly, the BBC Arabic news; and
unless most Arab readers are using fonts with which I am unfamiliar they are seeing the
same disjoining hamza as I am seeing.

>> and Unicode would encode it appropriately. As it is, Unicode has
>> inherited a typesetting model that is at odds with the script
>> tradition in a number of ways,

> wayS -- which ones? (Maybe you could send me that material off the list.)

The material is the fonts themselves. In this regard there are two kinds: those that were
originally designed for previous typesetting technologies, including e.g. hot metal
Linotype composition, and which have subsequently been converted or adapted to new
generations of technologies and now to Opentype; and those that are new designs but which
follow glyph sets determined by the existing fonts.

In both, one can see certain inherited mechanisms such as the idea of representing the
graphemic structure of the script via ligature glyphs, which are a mechanism dating to
metal typography, and which cannot accurately represent all of the normative shapes in the
grammar of the individual script styles. Monotype famously had a nas'taliq font that
included more than 20,000 ligature glyphs, and they still couldn't correctly display every
possible combination of letters in a word segment.

And the King Fuad edition of 1924 itself displays limitations of the typesetting
technology used, which also relied upon ligature glyphs and, hence, was able to accurately
represent only some of the naskh script on which the type style was based. For instance,
according to the grammar of the maskh script, the medial form of jim-shape letters always
joins from the top, as is seen consistently in the exemplars of scribes and calligraphers;
indeed, the resulting pattern of three jim-shape letters in succession is particularly
dramatic and exemplars including it are among the most often reproduced in books on
Islamic calligraphy. But the typeface of the King Fuad edition, if I recall correctly,
contains only ligatures for two jim-shapes in succession; so when three jim-shapes are set
in succession with this type the script system breaks down and two are joined vertically
while one is joined horizontally.

> To end constructively: If you want to go for a "chairless hamza"
> you MUST restrict the allowed behaviour of existing characters.
> Decomposition between hamza above/below and the four precomposed letters
> (and high hamza and its three precomposed letters) is fairly
> straightforward, but chaired and chairless hamza would have to be
> separated or if that's too difficult, we must go for the ONE hamza.

I'm not sure what you mean by 'restrict the allowed behaviour of existing characters'.

I favour this solution:

Leave the combining mark characters as they are; they seem to function fine and the
decompositions are straightforward.

Leave the Kazakh high hamza as it is and ignore it completely for Arabic language text.

Treat U+0621 as an amphibious character at the display level.

I don't *like* having to handle the joining behaviour of letters adjacent to hamza
contextually in font lookups. Ideally I shouldn't have to. But changing the properties of
U+0621 so that adjacent letters are made joining by compliant shaping engines would break
a lot of software and pretty much all current fonts. I can't see Unicode doing that, so I
think we're obliged to look for solutions at the display level.

John Hudson

PS. I may not respond to further discussion until after Christmas.

-- 
Tiro Typeworks        www.tiro.com
Gulf Islands, BC      [email protected]
At the sunset of our days on earth, at the moment of
death, we will be evaluated on the basis of our similarity
or otherwise with the Baby who is to be born in the poor
grotto of Bethlehem, since it is He who is the standard
of measurement which God has given to humanity.
                    -- Benedict XVI

Next message: John Hudson: "Re: [OT] Re: CLDR Usage of Gregorian Calendar Era Terms: BC and AD -- Can we please have "CE" and "BCE" ?"
Previous message: Jukka K. Korpela: "Re: CLDR Usage of Gregorian Calendar Era Terms: BC and AD -- Can we please have "CE" and "BCE" ?"
In reply to: arno: "Re: Chairless/Amphibious hamza"
Next in thread: arno: "Re: Chairless/Amphibious hamza"
Reply: arno: "Re: Chairless/Amphibious hamza"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Dec 21 2007 - 02:41:16 CST