RE: comments on Unicode PRI #231 from CE Whitehead on 2012-08-02 (Unicode Mail List Archive)

From: CE Whitehead <cewcathar_at_hotmail.com>
Date: Thu, 2 Aug 2012 15:16:52 -0400

public-i18n-bidi_at_w3.org

Thanks, Mati, for cc-ing the list.

I too sent some comments on the bidi parentheses algorithm (which alas have yet to be posted at http://www.unicode.org/review/pri231/ though I sent mine in time; only my original idiotic comment has been posted; for my more recent comments, see below); I am concerned as to what happens when the opening text is ltr or rtl but the final text is neutral and both directionalities are present within the parenthetical embedding. I do think the algorithm should address this and the all neutral text to be clear.
I've commented on a few of your comments (I'm not a developer so I responded where I could as a user), and then have pasted my comments below yours for reference; however I think your solution regarding taking the directionality of the embedding in non-clear cases is better than my solution, so my comments are appended to the end of this email just for reference.

From: matitiahu.allouche_at_gmail.com
To: public-i18n-bidi_at_w3.org
Date: Mon, 30 Jul 2012 00:54:58 +0300
Subject: comments on Unicode PRI #231

> The Unicode Technical Committee (UTC) has published a PRI (Public Review Issue) about a proposed Bidi Parenthesis Algorithm (see
> http://www.unicode.org/review/pri231/ ). I submit below the comments I have sent to UTC. Although the closing date for comments
> has gone by a few days ago, anybody with something to contribute should do so promptly, and the comments will probably be accepted
> since there has not been a lot of traffic about this PRI. > Regards, Mati > <start of my comments> > Ooops! I am past the closing date. I hope the comments below will be considered nevertheless.
> 4) Line 115 mentions "the directionality of the enclosed content". It is not clear what this directionality is when the content includes
>mixed LTR and RTL text.I think this case should get a separate bullet here.
> 5) For completeness, rule N0 should specify what happens when the enclosed text is all N, even if to say that the BPA does not affect
> this case.
Yes, agreed, I think so, too.
> . . .
7) Is a solution to the current problem of mismatched parenthesis desirable?I am not sure, because of the following reasons:> a. The UBA is already quite complex. The BPA would add still more complexity. Proof is that, if my comments 1-3 and 6 above are
> founded, even the author of the proposal has missed some fine points. And if my comments are not founded, I am myself the one who
> got confused, despite the fact that I have more experience in bidi matters than the average person.Your comments above refer to the current algorithm, right, whereas the new algorithm would match these in all cases and thus reduce complexity, right?
> b. Consider a text editor implementing the UBA and BPA by transforming the logical text to visual display after each keystroke. When
> entering an open paren, the BPA will not kick in, since it is not paired. When entering the closing paren, the BPA will kick in, possibly
> modifying the display of text around the opening paren, which may be a few lines far from the typing location.> The UBA also has effects of modifying the appearance of text already entered, but it is always in the close neighborhood of the typing
> location.Hmm, this already happens when I type at say facebook (sorry to mention an example) in two directions; for example, if I type text in Arabic then an English definition or a Romanization, and then have another word in the definition separated by a punctuation mark such as a dash, text gets moved around before my eyes. I think thus that this is not a major argument against the revision of the algorithm, that is, that people are quite used to this and that having text display properly after a little typing is something bidirectional typists will appreciate (I know I do; I am learning when I can't have punctuation at facebook, all that, and when I can use it and it will ultimately display o.k. so I am assuming other people feel as I do.)
> 8) Does the proposed solution meet expectations in terms of the naturalness of segmentation and directional flow of enclosed units?> The proposal assumes that opposite direction content within parentheses forms a unique directional run with opposite direction text on > either side. When one of the sides has the embedding direction, I don't see that the context has opposite direction rather than
> embedded direction. In doubt, the BPA should not assume opposite direction for the parentheses.> Here is an example. The text in logical order (with upper case representing RTL letters) is > "I LIVE IN paris (france)."> Assuming a RTL paragraph direction, the UBA will display > ".(paris (france NI EVIL I"> The BPA will display> ".paris (france) NI EVIL I"> which is better. However, since the general direction of the text is RTL, I prefer to have it displayed as > ".(france) paris NI EVIL I"
Yes, I am in agreement here, though either solution is better than what we have now.
> To get this result, rule N0 can be reformulated as follows:> N0. Paired punctuation marks take the opposite direction if the enclosed text contains no strong type of the embedding direction and the > external neighbors on both sides have the opposite direction. Else the paired punctuation marks take the embedding direction. Yes this might work; I had suggested that the text that preceded the opening parentheses logically held sway but this might work better.In any case, your suggestion will solve the problem of cases where neutral text is contained within and also following the parentheses.
> . . .

Best,

--C. E. Whitehead
cewcathar_at_hotmail.com

* * * Appendix: My Comments * * *

Below are the comments I sent:
* * *
Hi. I realize that the bidi parenthes algorithm is not currently being
discussed on the list, but wanted to cc the list with my feedback (I've
already sent it to unicode (using the form), but I wanted to make
"double sure" that my feedback gets to the right place; also I've made a
few edits to the feedback I submitted; thus the comments here may be a
little more clear.)

1rst, apologies for mentioning happy and sad
faces, including these four (: , :) , :( , ): -- these will not be
ordered as paired parentheses normally and would be ignored by the
algorithm!!! This is fine I think.
On the same note, although I'm not completely sure about curly brackets { };
(please see discussion of single curly braces in legal documents:
http://www.oooforum.org/forum/viewtopic.phtml?t=53089 ),
in
such cases the bidi parentheses rule will just not be implemented, so I
believe that such uses are no problem again! A similar thing happens
with closing parenthes which are used after numbers and letters --
1).
2.)
3.)
But again these should not normally be a problem.

Thus go ahead with the following four sets of braces:
(), [], <>, and {}.
(I am not sure about other brackets and braces however -- in miscellaneous symbols and elsewhere).

2nd,

** IMO, the algorithm should be part of Unicode's core,
as Unicode core's previous way of handling braces should be improved/corrected in the core,
even though (IMO again) the current rules HL4 and HL5 are also fine,
and do enable applications to fix/tweak the basic bidi algorithm.
What
I mean is that, since if an app ignores HL4 and HL5 and simply applies
the Unicode bidi algorithm the result could be mismatched brackets (in
some cases);further, the rules in the core should be fixed in the core
(IMO again) so long as they are only fixed for marks that are universal,
and not language specific. (I don't see a problem with backwards
compatibility -- any workarounds should still work.)

** Also I agree it's best to locate the matching brackets before applying the rest of the bidi algorithm.

Third, 3.1. Some comments on the algorithm itself:
"If
an open parenthesis is found, push it onto a stack and continue the
scan. If a close parenthesis is 85 found, check if the stack is not
empty and the close parenthesis is the other member of the mirrored pair
for the character on the top of the stack. If so, pop the stack and
continue the scan; else return failure. If the end of the paragraph is
reached, return success if the stack is empty; else return failure.
Success implies that all open and close parentheses, if any, in a
paragraph are matched correctly. Failure implies that there are one or
more mismatched paired punctuation marks in a run and therefore the 90
handling under the parenthesis algorithm will not be attempted."
** The above is fine, IMO
"The
rationale for following the embedding level in the normal case is that
the text segment enclosed by 120 the paired punctuation marks will
conform to the progression of other text segments in the writing
direction. In the exception cases, the rationale to follow the opposite
direction is based on context being established between the enclosed and
adjacent segments with the same direction."
** Agreed, yes, embedding level should be followed in normal case, albeit for both brackets.
"Other
neutral types adjacent to paired punctuation marks are resolved
subsequent to resolving the paired punctuation marks themselves, and
will therefore be influenced by that resolution."
** Agreed again, yes, so far so good.
"The
directionality of the enclosed content is opposite the embedding
direction, and at least one 115 neighbor has a bidi level opposite to
the embedding direction O(O)E, E(O)O, or O(O)O."
"*N0. Paired
punctuation marks take the embedding direction if the enclosed text
contains a strong type of the same direction. Else, if the enclosed text
contains a strong type of the opposite direction and at least one
external neighbor also has that direction the paired punctuation marks
take the direction opposite the embedding direction."
** I disagree with the above statements.

R(R)L à R -- that is, with embedding ltr -- o.k.
L(R)R
à R -- that is, with embedding ltr -- No; these brackets should take
the ltr directionality, that is should take the directionality of the
embedding level if same directionality immediately precedes opening
paren (IMO again but see examples below).

** Same problem in an rtl embedding environment:
L(L)R
à L with embedding rtl -- O.k. the text that precedes the parens is
ltr, as is the text in parens, so fine, let the directionality be
different from that of embedding directionality.
R(L)L à L with
embedding rtl -- No; same problem as above in the ltr embedding
environment! The directionality of the embedding is the same as the
directionality of the text immediately preceding the parentheses. I
think this sets the reader's expectation for the display of the parens!

So for example (note: as is your convention, upper case letters designate RTL characters/text and lower case designate ltr):
TEXT: AS-SAYYAD AL-ALIFBAYT (w3c lead, balad1), abc (w3c lead, country2).
O.k.,
if the embedding directionality of this text is ltr, these parenthese
can be displayed as ltr since there is enough ltr text both inside and
outside of them.
However, if the embedding is rtl, the rtl text
immediately preceding the parentheses makes me expect to see the
parentheses displayed as rtl;
thus in this case should not the
directionality of the embedding and preceding text determine the
directionality of the whole, even for the second set of parentheses?
(sorry for the commas in my example below of "proper" bidi layout; I
can't find an rtl comma on my keyboard):

=> (country2 ,w3clead abc ,(balad1 ,w3clead) TYABFILA-LA DAYYAS-SA :TXET
(**
Thus, I think that, together with the directionality of the embedding
level, the text that logically precedes the parens is critical to the
determination of the directionality of the parens.)

3.2. Also,
Sometimes there is no adjacent text on one side of a set of brackets, or
on the other: although fortunately parentheses rarely begin a text
block (except in programming, and in my writing), they often end a text
block, followed by a neutral punctuation mark:
L(R)N
{ ? Have you addressed such cases in your algorithm? I must have missed something. }
* In an ltr text/embedding the above L(R)N should clearly be ltr. {And R(L)N should be rtl in an rtl text.}
R(R)N
* In an ltr text the above should run rtl nevertheless! {And L(L)N should run ltr in an rtl embedding!}
These two cases above seem to me to be two obvious cases!!

However, for the next two cases the solution is not so obvious to me:
L(R)N in an rtl context/embedding
* (should it remain rtl? I am unsure. Probably.)
R(L)N in an ltr context/embedding
* (same question; should it remain ltr? Probably.)

The following cases I am again more sure of:
L(RL)N in an rtl context/embedding
* I would make the text just above ltr
R(RL)N in an ltr context/embedding
* I would make the text just above rtl

> . . .

4.
A note: Dictionary definitions, with phonetic transcriptions and
examples of usage are one place where the bidi parentheses algorithm
might apply a lot. Also language texts.

Best,
--C.E. Whitehead
cewcathar_at_hotmail.com

Received on Thu Aug 02 2012 - 14:23:16 CDT

This archive was generated by hypermail 2.2.0 : Thu Aug 02 2012 - 14:23:18 CDT