Re: PRI #231: Bidi Parenthesis Algorithm

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 6 Jun 2012 23:50:22 +0200

Great. I really hope that all major web browsers being updated to
implement it in their renderers.

This is the most urgent need, even before word processors whose
rendering is much more predictable in their rich text formats, and
even if the text encoding was specially contrieved to work with UBA
only.

BPA should not be enabled automatically when opening a word processor
document created with a version where it was not explicitly enabled,
but new documents may still be created and will be storing an internal
meta data specifying that BPA was enabled. Word processor documents
have all the capabilities to support version tracking, so this is much
less a problem for stability than in web formats (mostly HTML).

Plain-text documents (*.txt) are another problem, but they are created
only with a single left-margin, or a single right-margin alignment
were they are correctly readable. I've not found many of them
contrivied in such a way that its Arabic content (when the document is
written in Arabic) looks correct when reading it with a LTR
document-level presentation. It is assumed that this document-level
presentation will be changed immediately by the user opening the file
for display, if the "notepad-like" application does not autodetect
this preferred presentation.

But BPA will be highly desirable and should be enable by default for
presenting short plain-text data fields (limited to one line or less,
without any line-break). E.g. in document titles or filenames.

In fact, even if BPA does not resolve all ambiguities, in most texts
it will offer a good solution to avoid directional control formats as
well : surround the problematic embedded texts by parenthese-like
pairs.

Ideographic quotation marks are included (those that look like half
square brackets, or corners).

But the usual quotation marks used in Latin/Greek/Cyrillic keep their
ambiguity (because their pairing is language dependant), but we still
have no suitable quotation marks for RTL scripts that would allow
using BPA as well for them

The solution offered for now by this BPA to text authors could be to
use the ideographic truncated square brackets as well in mixed RTL/LTR
contexts, but in my opnion they are too high, not suitable for use
with mixed Latin/Arabic or Latin/Hebrew, and I would highly prefer
that they pair correctly both under the baseline or both above the
Hebrew Arabic points and Latin diacritics, in a smaller version where
their vertical line just goes from the ascender line to the middle of
the x-height, or from the descender line up to the baseline, and with
an horitontal width equal to this vertical height, and a very narrow
side-bearing on their internal side (the external side-bearing below
allowed to be a bit larger); they could also use the metrics of
existing square brackets.

This would mean desunifying the existing ideographic corner brackets
to support a more suitable use with
Latin/Greek/Cyrillic/Hebrew/Arabic. For now, this BPA just resolves
cleanly the case of text mixing LTR East Asian scripts (sinograms,
Hangul/Jamos, kanas, "wide" Latin, Bopomofo) and RTL scripts, because
these ideographic square brackets are suitable for them. But not
really for the common Latin/Arabic and Latin/Hebrew cases. In their
desunified versions, these corner brackets would have very different
metrics, including in their representative glyph in the charts (and
they are definitely not the same as the "NOT SIGN", or the corners in
Box drawing symbols (wrong placement of the horizontal segment, and
wrong joining)

2012/6/6 <announcements_at_unicode.org>:
> The Unicode Technical Committee is seeking feedback on a proposal to enhance
> the Unicode Bidirectional Algorithm (UAX #9) with additional logic--a
> bidirectional parenthesis algorithm (BPA)--for processing paired punctuation
> marks such as parentheses. This proposal is intended to produce better
> bidi-layout results in common text sequences that involve paired punctuation
> marks. Details of the proposal, with questions for reviewers and a detailed
> background document are available through the PRI #231 page:
> http://www.unicode.org/review/pri231/
Received on Wed Jun 06 2012 - 16:54:16 CDT

This archive was generated by hypermail 2.2.0 : Wed Jun 06 2012 - 16:54:17 CDT