Re: Need for Level Direction Mark

From: Peter Edberg <pedberg_at_apple.com>
Date: Fri, 16 Sep 2011 18:59:47 -0700

I would like to address some of the feedback on LDM at http://www.unicode.org/review/pri205/ and in this thread. First, a review of some use cases.

One example that was given in the PRI was of an Arabic numeric date of the form dd/MM/yyyy in which the fields should flow left-to-right (e.g. ٠٩/١٦/٢٠١١) in a left-right context (i.e. the date and perhaps some other Arabic text are in a mainly Latin-script paragraph), but should flow right-to-left (e.g ٢٠١١/١٦/٠٩) in a right-left context (e.g. a primarily Arabic-script paragraph). The date may or may not be preceded or followed by Arabic letters. If the direction context is known when the date format is created, then RLMs can be used if necessary to force the desired flow. However, if the date format is being created from standard data (as from CLDR) and inserted, or if it is copied from some other context, it may not end up laid out as desired. To address this, an LDM could be used before and after the date, and before each of the '/' in the date. This would produce the desired result in all cases.

Another example is that in UAX #9 section 5.6 (http://www.unicode.org/reports/tr9/#Separators). In this case an LDM could be used before each '-' to achieve the correct layout regardless of overall page direction.

Also note that the behavior of '/' between digits depends on language & locale as well as usage. For example, numeric fractions represented as numerator/denominator always flow left-to-right in Hebrew regardless of direction context (e.g. 1/3), whereas in a right-to-left Arabic context they flow right-to-left (e.g. Ù£/Ù¡ or preferably Ù¡\Ù£). This affects the degree to which heuristics can be used to determine LDM-like behavior.

Second, responses to some of the suggestions/comments:

1. Richard Wordingham suggested that for the Arabic date example (dd/MM/yyyy), surrounding the '/' with RLM before and LRM after works as well as using LDM before the '/'. And for an isolated instance of such a date, it does; having opposite strong directions on either side of the neutral forces it to take on the direction of the embedding level, by rule N2, and though the extra RLMs and LRMs will get moved around in layout depending on the embedding level, it does not matter because they are invisible.

However, it does not handle the situation in which the date is part of other text, and may be preceded or followed by Arabic letters (with an intervening space); there are layout interactions between the Arabic letters and adjacent Arabic digits, since the digits are not treated as being part of a longer sequence due the direction marks associated with the '/'. This can be solved by placing an LDM before and after the date, as well as before each '/'. However, using an RLM LRM sequence before and after the date causes the spaces around the date to reorder.

Furthermore, for the example in UAX #9 section 5.6, using RLM and LRM around the '-' causes reordering of the adjacent spaces, while using LDM before each '-' solves the layout problem.

2. Philippe Verdy suggests that the intent of LDM is to change the bidi class of a CS such as '/' to match the bidi class of the preceding EN character. Actually, the intent of LDM is to act like either LRM or RLM depending on the direction associated with the current embedding level; it has nothing to do with the class of any preceding character. Mr. Verdy also suggests that instead of using LDM, the solution is to instead encode another '/' with bidi class R. However, that is equivalent to using RLM before '/' and does not solve the problem described.

3. Kent Karlsson suggests several possible changes to the UBA. Considering for the moment just the portion dealing with "segmenting punctuation" (Pd etc.) and segment separators, since that is related to the intent of LDM: Mr. Karlsson suggests the addition of a new bidi class "Embedding Level Segment Separator", which basically has the directional behavior of LDM, and could be applied to any character (for example '/' in the date example) as an override. This aligns with option 3 in the background document (and once there is an LDM-like bidi class, I think it is a small step to actually encode a character like LRM or RLM that has this new class). Mr. Karlsson further suggests that characters of general category Pd, at least if they are surrounded by space, should behave as if they have this "Embedding Level Segment Separator" class. This would mean that the example in UAX #9 section 5.6 would be handled without explicit LDM. This is an interesting idea, but I have not examined all of the implications.

At any rate, it seems that if LDM-like behavior is needed, there is no alternative using existing controls. As Kent Karlsson says in the e-mail discussion, "All the workarounds w.r.t. LDM depend on the directionality of neighbouring characters, not directly on the embedding level direction. Therefore I think none of them will work properly in all cases (even though they may give the seemingly correct result in many cases)." Either we decide that this behavior is beyond the scope of the UBA, or we decide on one of the options presented (or come up with another).

- Peter E
Received on Fri Sep 16 2011 - 21:04:48 CDT

This archive was generated by hypermail 2.2.0 : Fri Sep 16 2011 - 21:04:50 CDT