Bidi reordering of soft hyphen

From: James Clark <jjc_at_jclark.com>
Date: Tue, 1 Apr 2014 12:51:11 +0700

Suppose I have a paragraph (uppercase = RTL):

   CARROT IS car\u00ADrot IN ENGLISH

and the paragraph gets broken at the soft hyphen.

Is the correct ordering for the first line

  car- SI TORRAC

or

  -car SI TORRAC

? I did not succeed in deducing the answer from UAX#9. Soft hyphen has
bidi class BN, which means it gets removed in stage X9, and so, if I have
understood correctly, doesn't have a defined embedding level.

I'm guessing the correct ordering is the first one, but I don't trust my
instincts here. (In particular, I wondered whether this was analogous to
the case where rule L1 resets embedding levels so that trailing whitespace
is at the visual end of the line.)

More generally, suppose you have a markup language which has a construct
for discretionary breaks, as in TeX, with pre-break, post-break and
no-break text. Soft hyphen is a special case of this (where the pre-break
text consists of a hyphen, and the pos and no-break texts are empty); you
can also regard space as a kind of discretionary break (post-break text
empty, no-break text contains the space, pre-break text either contains the
space or is empty, depending on how you want to think about it). Obviously
the embedding level for the no-break text should be resolved as if
discretionary break was replaced by the no-break text (which is consistent
with a bidi class of BN for soft hyphen). However, for the pre- and
post-break text, it is not clear to me what the right way is to resolve
embedding levels (or how their content should be restricted so that there
is a sensible way to resolve the embedding levels). I would be grateful for
any suggestions.

James

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Tue Apr 01 2014 - 09:26:17 CDT

This archive was generated by hypermail 2.2.0 : Tue Apr 01 2014 - 09:26:17 CDT