Re: PRI #231: Bidi Parenthesis Algorithm

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Thu, 7 Jun 2012 00:21:00 +0200

Sorry, I did not see that these characters were part of the proposed BPA :

U+2E22, U+2E23, ⸢ ⸣, TOP LEFT HALF BRACKET, TOP RIGHT HALF BRACKET
U+2E24, U+2E25, ⸤ ⸥, BOTTOM LEFT HALF BRACKET, BOTTOM RIGHT HALF BRACKET

So this resolves the issue for authors of mixed Latin/Arabic and
Latin/Hebrew. I just hope that these paired punctuations will get a
better support in core fonts of arious OSes

(suggestion: include them in all fonts in the Arial, Verdana, Times
New Roman, Courier New, and Segoe UI families on Windows, and
Helvetica, Times and Courier on Mac OS; I have nothing to propose for
Linux, distributions can easily be updated to include them rapidely in
their updates of their free core fonts ; core fonts for smartphones
OSes should also include them).

2012/6/6 Philippe Verdy <verdy_p_at_wanadoo.fr>:
> Great. I really hope that all major web browsers being updated to
> implement it in their renderers.
>
> This is the most urgent need, even before word processors whose
> rendering is much more predictable in their rich text formats, and
> even if the text encoding was specially contrieved to work with UBA
> only.
>
> BPA should not be enabled automatically when opening a word processor
> document created with a version where it was not explicitly enabled,
> but new documents may still be created and will be storing an internal
> meta data specifying that BPA was enabled. Word processor documents
> have all the capabilities to support version tracking, so this is much
> less a problem for stability than in web formats (mostly HTML).
>
> Plain-text documents (*.txt) are another problem, but they are created
> only with a single left-margin, or a single right-margin alignment
> were they are correctly readable. I've not found many of them
> contrivied in such a way that its Arabic content (when the document is
> written in Arabic) looks correct when reading it with a LTR
> document-level presentation. It is assumed that this document-level
> presentation will be changed immediately by the user opening the file
> for display, if the "notepad-like" application does not autodetect
> this preferred presentation.
>
> But BPA will be highly desirable and should be enable by default for
> presenting short plain-text data fields (limited to one line or less,
> without any line-break). E.g. in document titles or filenames.
>
> In fact, even if BPA does not resolve all ambiguities, in most texts
> it will offer a good solution to avoid directional control formats as
> well : surround the problematic embedded texts by parenthese-like
> pairs.
>
> Ideographic quotation marks are included (those that look like half
> square brackets, or corners).
>
> But the usual quotation marks used in Latin/Greek/Cyrillic keep their
> ambiguity (because their pairing is language dependant), but we still
> have no suitable quotation marks for RTL scripts that would allow
> using BPA as well for them
>
> The solution offered for now by this BPA to text authors could be to
> use the ideographic truncated square brackets as well in mixed RTL/LTR
> contexts, but in my opnion they are too high, not suitable for use
> with mixed Latin/Arabic or Latin/Hebrew, and I would highly prefer
> that they pair correctly both under the baseline or both above the
> Hebrew Arabic points and Latin diacritics, in a smaller version where
> their vertical line just goes from the ascender line to the middle of
> the x-height, or from the descender line up to the baseline, and with
> an horitontal width equal to this vertical height, and a very narrow
> side-bearing on their internal side (the external side-bearing below
> allowed to be a bit larger); they could also use the metrics of
> existing square brackets.
>
> This would mean desunifying the existing ideographic corner brackets
> to support a more suitable use with
> Latin/Greek/Cyrillic/Hebrew/Arabic. For now, this BPA just resolves
> cleanly the case of text mixing LTR East Asian scripts (sinograms,
> Hangul/Jamos, kanas, "wide" Latin, Bopomofo) and RTL scripts, because
> these ideographic square brackets are suitable for them. But not
> really for the common Latin/Arabic and Latin/Hebrew cases. In their
> desunified versions, these corner brackets would have very different
> metrics, including in their representative glyph in the charts (and
> they are definitely not the same as the "NOT SIGN", or the corners in
> Box drawing symbols (wrong placement of the horizontal segment, and
> wrong joining)
>
> 2012/6/6  <announcements_at_unicode.org>:
>> The Unicode Technical Committee is seeking feedback on a proposal to enhance
>> the Unicode Bidirectional Algorithm (UAX #9) with additional logic--a
>> bidirectional parenthesis algorithm (BPA)--for processing paired punctuation
>> marks such as parentheses. This proposal is intended to produce better
>> bidi-layout results in common text sequences that involve paired punctuation
>> marks. Details of the proposal, with questions for reviewers and a detailed
>> background document are available through the PRI #231 page:
>> http://www.unicode.org/review/pri231/
Received on Wed Jun 06 2012 - 17:24:40 CDT

This archive was generated by hypermail 2.2.0 : Wed Jun 06 2012 - 17:24:41 CDT