Feedback on PRI #231: Bidi Parentheses Algorithm (Was: PRI #231: Bidi Parenthesis Algorithm) from CE Whitehead on 2012-07-13 (Unicode Mail List Archive)

From: CE Whitehead <cewcathar_at_hotmail.com>
Date: Fri, 13 Jul 2012 22:13:04 -0400

Hi. I realize that the bidi parenthes algorithm is not currently being discussed on the list, but wanted to cc the list with my feedback (I've already sent it to unicode (using the form), but I wanted to make "double sure" that my feedback gets to the right place; also I've made a few edits to the feedback I submitted; thus the comments here may be a little more clear.)

1rst, apologies for mentioning happy and sad faces, including these four (: , :) , :( , ): -- these will not be ordered as paired parentheses normally and would be ignored by the algorithm!!! This is fine I think.
On the same note, although I'm not completely sure about curly brackets { };
(please see discussion of single curly braces in legal documents:
http://www.oooforum.org/forum/viewtopic.phtml?t=53089 ),
in such cases the bidi parentheses rule will just not be implemented, so I believe that such uses are no problem again! A similar thing happens with closing parenthes which are used after numbers and letters --
1).
2.)
3.)
But again these should not normally be a problem.

Thus go ahead with the following four sets of braces:
(), [], <>, and {}.
(I am not sure about other brackets and braces however -- in miscellaneous symbols and elsewhere).

2nd,

** IMO, the algorithm should be part of Unicode's core,
as Unicode core's previous way of handling braces should be improved/corrected in the core,
even though (IMO again) the current rules HL4 and HL5 are also fine,
and do enable applications to fix/tweak the basic bidi algorithm.
What I mean is that, since if an app ignores HL4 and HL5 and simply applies the Unicode bidi algorithm the result could be mismatched brackets (in some cases);further, the rules in the core should be fixed in the core (IMO again) so long as they are only fixed for marks that are universal, and not language specific. (I don't see a problem with backwards compatibility -- any workarounds should still work.)

** Also I agree it's best to locate the matching brackets before applying the rest of the bidi algorithm.

Third, 3.1. Some comments on the algorithm itself:
"If an open parenthesis is found, push it onto a stack and continue the scan. If a close parenthesis is 85 found, check if the stack is not empty and the close parenthesis is the other member of the mirrored pair for the character on the top of the stack. If so, pop the stack and continue the scan; else return failure. If the end of the paragraph is reached, return success if the stack is empty; else return failure. Success implies that all open and close parentheses, if any, in a paragraph are matched correctly. Failure implies that there are one or more mismatched paired punctuation marks in a run and therefore the 90 handling under the parenthesis algorithm will not be attempted."
** The above is fine, IMO
"The rationale for following the embedding level in the normal case is that the text segment enclosed by 120 the paired punctuation marks will conform to the progression of other text segments in the writing direction. In the exception cases, the rationale to follow the opposite direction is based on context being established between the enclosed and adjacent segments with the same direction."
** Agreed, yes, embedding level should be followed in normal case, albeit for both brackets.
"Other neutral types adjacent to paired punctuation marks are resolved subsequent to resolving the paired punctuation marks themselves, and will therefore be influenced by that resolution."
** Agreed again, yes, so far so good.
"The directionality of the enclosed content is opposite the embedding direction, and at least one 115 neighbor has a bidi level opposite to the embedding direction O(O)E, E(O)O, or O(O)O."
"*N0. Paired punctuation marks take the embedding direction if the enclosed text contains a strong type of the same direction. Else, if the enclosed text contains a strong type of the opposite direction and at least one external neighbor also has that direction the paired punctuation marks take the direction opposite the embedding direction."
** I disagree with the above statements.

R(R)L à R -- that is, with embedding ltr -- o.k.
L(R)R à R -- that is, with embedding ltr -- No; these brackets should take the ltr directionality, that is should take the directionality of the embedding level if same directionality immediately precedes opening paren (IMO again but see examples below).

** Same problem in an rtl embedding environment:
L(L)R à L with embedding rtl -- O.k. the text that precedes the parens is ltr, as is the text in parens, so fine, let the directionality be different from that of embedding directionality.
R(L)L à L with embedding rtl -- No; same problem as above in the ltr embedding environment! The directionality of the embedding is the same as the directionality of the text immediately preceding the parentheses. I think this sets the reader's expectation for the display of the parens!

So for example (note: as is your convention, upper case letters designate RTL characters/text and lower case designate ltr):
TEXT: AS-SAYYAD AL-ALIFBAYT (w3c lead, balad1), abc (w3c lead, country2).
O.k., if the embedding directionality of this text is ltr, these parenthese can be displayed as ltr since there is enough ltr text both inside and outside of them.
However, if the embedding is rtl, the rtl text immediately preceding the parentheses makes me expect to see the parentheses displayed as rtl;
thus in this case should not the directionality of the embedding and preceding text determine the directionality of the whole, even for the second set of parentheses? (sorry for the commas in my example below of "proper" bidi layout; I can't find an rtl comma on my keyboard):

=> (country2 ,w3clead abc ,(balad1 ,w3clead) TYABFILA-LA DAYYAS-SA :TXET
(** Thus, I think that, together with the directionality of the embedding level, the text that logically precedes the parens is critical to the determination of the directionality of the parens.)

3.2. Also, Sometimes there is no adjacent text on one side of a set of brackets, or on the other: although fortunately parentheses rarely begin a text block (except in programming, and in my writing), they often end a text block, followed by a neutral punctuation mark:
L(R)N
{ ? Have you addressed such cases in your algorithm? I must have missed something. }
* In an ltr text/embedding the above L(R)N should clearly be ltr. {And R(L)N should be rtl in an rtl text.}
R(R)N
* In an ltr text the above should run rtl nevertheless! {And L(L)N should run ltr in an rtl embedding!}
These two cases above seem to me to be two obvious cases!!

However, for the next two cases the solution is not so obvious to me:
L(R)N in an rtl context/embedding
* (should it remain rtl? I am unsure. Probably.)
R(L)N in an ltr context/embedding
* (same question; should it remain ltr? Probably.)

The following cases I am again more sure of:
L(RL)N in an rtl context/embedding
* I would make the text just above ltr
R(RL)N in an ltr context/embedding
* I would make the text just above rtl

--- My Comments Have Been Snipped Here --

4. A note: Dictionary definitions, with phonetic transcriptions and examples of usage are one place where the bidi parentheses algorithm might apply a lot. Also language texts.

Best,
--C.E. Whitehead
cewcathar_at_hotmail.com

Received on Fri Jul 13 2012 - 21:18:43 CDT

This archive was generated by hypermail 2.2.0 : Fri Jul 13 2012 - 21:18:55 CDT