Re: PRI #231: Bidi Parenthesis Algorithm

From: CE Whitehead <>
Date: Thu, 7 Jun 2012 11:48:39 -0400


From: Konstantin Ritt <>

Date: Thu, 7 Jun 2012 13:06:04 +0300

> Yep, forgot to mention that the difference is in that that some paired

> quotation characters might be used alone in place of apostrophe, etc.

> so that the BPA rules could be relaxed for the quotation marks.

> Dunno about their mirroring in all languages. I thought the

> BidiMirroring.txt is supposed to list a (language-independent)

> characters and their respective mirrored brothers.

> UAX#24 section 2.2 "Handling Characters with the Common Script Property" states:

>> In determining the boundaries of a run of
text in a given script, programs must resolve any of the special script
property values, such >> as Common, based on the context of the surrounding
 characters. A simple heuristic uses the script of the preceding
character, which >> works well in many cases. However, this may not always
produce optimal results. For example, in the text "... gamma () is
...", this >> heuristic would cause matching parentheses to be in different


>> Generally, paired punctuation, such as
brackets or quotation marks, belongs to the enclosing or outer level of
the text and should
>> therefore match the script of the enclosing text. In
 addition, opening and closing elements of a pair resolve to the same
script property >> values, where possible. The use of quotation marks is
language dependent; therefore it is not possible to tell from the
character code >> alone whether a particular quotation mark is used as an
opening or closing punctuation. For more information, see Section 6.2,
>> General Punctuation, of [Unicode].


>> Some characters that are normally used as
paired punctuation may also be used singly. An example is U+2019 right
single quotation >> mark, which is also used as apostrophe, in which case
it no longer acts as an enclosing punctuation. An example from physics
would >> be <| or |>, where the enclosing punctuation characters
may not form consistent pairs.

> IIUC, this is the same problem like the one PRI #231 is intended to solve.

> For the cases like "ab" one would expect similar results provided by

> the UBA and the script itemization.

> Konstantin

2012/6/7 Philippe Verdy <>:

>> Their pairing and mirroring is not appropriate for all languages using them.


>> 2012/6/7 Konstantin Ritt <>:

>>> Actually, they have a respective entries in the BidiMirroring.txt:



>>> and mapped into gc=Pi and gc=Pf.

>>> Even without the per-language tailoring, it seems like a good basic

>>> approximation, no?

Phillipe is correct; Wikipedia gives some examples of language-specific variation in opening and closing quotation marks:

(also of course as Konstantin notes the single quotation marks are used in some languages as apostrophes to indicate possession)

I have not used say French-style quotations in facebook where parentheses get displayed at the wrong places if used in mixed right-to-left and left-to-right text. So I dunno what happens to quotation marks in mixed-directionality text yet.


--C. E. Whitehead

Received on Thu Jun 07 2012 - 10:50:53 CDT

This archive was generated by hypermail 2.2.0 : Thu Jun 07 2012 - 10:50:53 CDT