Re: Clarification of Arabic joining classes

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Oct 09 2000 - 20:46:46 EDT


Roozbeh asked:

> In TUS 3.0, page 192, Table 8-2, it is said all "format mark"s are
> considered transparent. What is the exact definition of "format mark"?
> If it's the class "Cf", is ZWNBSP included?
>

The intention of the author of this text was "format marks" in the
context of the bidirectional algorithm, which would include:

200E;LEFT-TO-RIGHT MARK;Cf;0;L;;;;;N;;;;;
200F;RIGHT-TO-LEFT MARK;Cf;0;R;;;;;N;;;;;
202A;LEFT-TO-RIGHT EMBEDDING;Cf;0;LRE;;;;;N;;;;;
202B;RIGHT-TO-LEFT EMBEDDING;Cf;0;RLE;;;;;N;;;;;
202C;POP DIRECTIONAL FORMATTING;Cf;0;PDF;;;;;N;;;;;
202D;LEFT-TO-RIGHT OVERRIDE;Cf;0;LRO;;;;;N;;;;;
202E;RIGHT-TO-LEFT OVERRIDE;Cf;0;RLO;;;;;N;;;;;

It clearly cannot mean *all* characters with the General Category "Cf",
because the following two:

200C;ZERO WIDTH NON-JOINER;Cf;0;BN;;;;;N;;;;;
200D;ZERO WIDTH JOINER;Cf;0;BN;;;;;N;;;;;

are explicitly Non-joining and Join-causing, respectively.

However, I think it is safe to say that:

FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;;;;;N;BYTE ORDER MARK;;;;

should also be considered Transparent, rather than Non-joining or
Join-causing. Despite its name, the ZWNBSP is *not* a space character.
It can be used to indicate the lack of a break opportunity in
a word, but that is orthogonal to any consideration of cursive
joining.

I agree that there is some ambiguity in the text here, which should
be addressed in the next edition of the standard.

--Ken
 



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT