Re: Latin ligatures and Unicode

From: John Cowan (jcowan@reutershealth.com)
Date: Wed Dec 22 1999 - 10:23:08 EST


"Reynolds, Gregg" wrote:

> But is that an entirely accurate description of the semantics of ZWJ in
> Arabic? All ZWJ means with respect to joining is "thou shalt join";

Actually, not. See below.

> it doesn't say anything about which joining form to use; that is determined
> syntactically.

True.

> As an example of how ZWJ could be put to good use with no notion of
> "deceiving the renderer", consider that lexigraphic words in Arabic
> frequently contain multiple lexemes. For example, kitAbuhA = kitAbu, book,
> + suffix hA, of her = her book. In the absence of a proper codepoint with
> "LEXEME DELIMITER" semantics, I can use ZWJ to provide such semantics
> without affecting the rendering and search/sort behavior of standard Unicode
> software:

You *can* do so, but that is not the standard use of ZWJ. ZWNBSP would probably
serve you better.

> In this example, ZWJ falls between two characters of the joining class; it
> has no effect on their form, and the ligation is formed.

Then there is no point in it, at least not according to the standard definitions.

> Or, define it as a non-printing character of the
> dual-joining class.

This is probably the best definition. ZWNJ, then is an invisible character
of the non-joining class.

> I guess I'd have to differ with you on this interpretation. Seems
> reasonable to me to talk of joining forms of just about anything _within a
> local context_. Where 'fi' ligatures exist, the 'f' of the ligated form is
> not the same form as isolated 'f'.

Agreed. I think the trouble comes from the word JOINER in the names of ZWJ
and ZWNJ. These characters do not "join" anything; rather, they provoke
the shaping of surrounding characters by creating a pseudo-context.

> While we're at it, we also need a way to stretch the space between two
> adjacent Arabic letterforms that don't join, but without introducing word
> separation. Tatweel would work just fine if marking semantics were made
> dependent on syntactic context - i.e. it should not be considered
> "join-causing"; it's semantics should simply be "stretch whatever's there,
> be it whitespace or a ligating stroke."

That is the function of NBSP.

> Also needed: a means of placing diacritics over null space - e.g. over space
> or a ligating stroke. ZWJ would be good for this, except for the part about
> zero width. Anyway, that's a subject for a different thread and I gotta get
> back to the grindstone.

The convention is to put the diacritic following SP; NBSP would work equally
well, sometimes better.

-- 

Schlingt dreifach einen Kreis vom dies! || John Cowan <jcowan@reutershealth.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:57 EDT