Re: PRI #231: Bidi Parenthesis Algorithm

From: CE Whitehead <cewcathar_at_hotmail.com>
Date: Thu, 7 Jun 2012 11:48:39 -0400

Hi.

From: Konstantin Ritt <ritt.ks_at_gmail.com>

Date: Thu, 7 Jun 2012 13:06:04 +0300

> Yep, forgot to mention that the difference is in that that some paired

> quotation characters might be used alone in place of apostrophe, etc.

> so that the BPA rules could be relaxed for the quotation marks.

> Dunno about their mirroring in all languages. I thought the

> BidiMirroring.txt is supposed to list a (language-independent)

> characters and their respective mirrored brothers.

> UAX#24 section 2.2 "Handling Characters with the Common Script Property" states:

>> In determining the boundaries of a run of
text in a given script, programs must resolve any of the special script
property values, such >> as Common, based on the context of the surrounding
 characters. A simple heuristic uses the script of the preceding
character, which >> works well in many cases. However, this may not always
produce optimal results. For example, in the text "... gamma () is
...", this >> heuristic would cause matching parentheses to be in different
 scripts.

>>

>> Generally, paired punctuation, such as
brackets or quotation marks, belongs to the enclosing or outer level of
the text and should
>> therefore match the script of the enclosing text. In
 addition, opening and closing elements of a pair resolve to the same
script property >> values, where possible. The use of quotation marks is
language dependent; therefore it is not possible to tell from the
character code >> alone whether a particular quotation mark is used as an
opening or closing punctuation. For more information, see Section 6.2,
>> General Punctuation, of [Unicode].

>>

>> Some characters that are normally used as
paired punctuation may also be used singly. An example is U+2019 right
single quotation >> mark, which is also used as apostrophe, in which case
it no longer acts as an enclosing punctuation. An example from physics
would >> be <| or |>, where the enclosing punctuation characters
may not form consistent pairs.

> IIUC, this is the same problem like the one PRI #231 is intended to solve.

> For the cases like "ab" one would expect similar results provided by

> the UBA and the script itemization.

> Konstantin

2012/6/7 Philippe Verdy <verdy_p_at_wanadoo.fr>:

>> Their pairing and mirroring is not appropriate for all languages using them.

>>

>> 2012/6/7 Konstantin Ritt <ritt.ks_at_gmail.com>:

>>> Actually, they have a respective entries in the BidiMirroring.txt:

>>> 00AB; 00BB # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK

>>> 00BB; 00AB # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

>>> and mapped into gc=Pi and gc=Pf.

>>> Even without the per-language tailoring, it seems like a good basic

>>> approximation, no?

Phillipe is correct; Wikipedia gives some examples of language-specific variation in opening and closing quotation marks:
http://en.wikipedia.org/wiki/Non-English_usage_of_quotation_marks

(also of course as Konstantin notes the single quotation marks are used in some languages as apostrophes to indicate possession)

I have not used say French-style quotations in facebook where parentheses get displayed at the wrong places if used in mixed right-to-left and left-to-right text. So I dunno what happens to quotation marks in mixed-directionality text yet.

Best,

--C. E. Whitehead
cewcathar_at_hotmail.com

                                               
Received on Thu Jun 07 2012 - 10:50:53 CDT

This archive was generated by hypermail 2.2.0 : Thu Jun 07 2012 - 10:50:53 CDT