Re: Need for Level Direction Mark from Philippe Verdy on 2011-09-18 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Mon, 19 Sep 2011 05:44:27 +0200

2011/9/19 Peter Edberg <pedberg_at_apple.com>:
> Philippe,
>
> On Sep 17, 2011, at 12:54 PM, Philippe Verdy wrote:
>
>> 2011/9/17 Peter Edberg <pedberg_at_apple.com>:
>>> 2. Philippe Verdy suggests that the intent of LDM is to change the bidi class of a CS such as '/' to match the bidi class of the preceding EN character. Actually, the intent of LDM is to act like either LRM or RLM depending on the direction associated with the current embedding level; it has nothing to do with the class of any preceding character. Mr. Verdy also suggests that instead of using LDM, the solution is to instead encode another '/' with bidi class R. However, that is equivalent to using RLM before '/' and does not solve the problem described.
>>
>> Well, I was still not sure about the intent of this ambiguous LDM. But
>> if the CS character should adopt the embedding direction to reorder
>> the numeric fields that it separates, I still think that the best is
>> to embed those numeric fields in RLE..PDF or LRE..PDF (you can choose
>> them arbitrarily, independantly of the numeric characters used in
>> those fields; or even if those fields contain letters such as an
>> abbreviated month name, or a CJK telegraphic abbreviation for month
>> numbers or year numbers; but if the content of the field contains
>> itself some whitespaces or variable characters, the choice of LRE..PDF
>> or RLE..PDF would not be without importance, for the inner
>> presentation of the content of this field, but would still have no
>> influence outside of the field).
>
> Yes, for the numeric date example, wrapping each of the numeric
> fields in, say, LRE..PDF does solve the problem; since embedding
> boundaries are treated as sol/eol, a neutral character such as '/'
> between embedding boundaries will take on the embedding direction.
> And the digits inside the embeddings will not interact for layout with
> letters outside the date.
>
> However, as you suggest, for a more complex example such as
> that in UAX #9 section 5.6, to obtain proper behavior, one would
> need to know the overall page direction in order to choose whether
> to wrap each text field in LRE..PDF or RLE..PDF. The whole point
> of LDM was to be able to create semi-structured elements such as
> the example in UAX #9 section 5.6 *without* knowing in advance
> the direction context in which the element would be used.

You absolutely don't need to know in advance the direction of context
before using LRE..PDF or RLE..PDF. It will work in both directions,
ordering and separating the fields in the same order as this context.
So yes LRE..PDF and RLE..PDF create a semi-structure, which does fit.

LRE..PDF and RLE..PDF also have a bijective mapping with the wellknown
HTML "dir=" attribute of inline elements, when it gets mapped into the
equivalent CSS style property that can map this dir= attribute with
bidi embedding values, so that these Bidi controls (strongly not
recommanded in HTML) can be avoided completely. This means that a date
like visually rendered "12/31/2011" in a LTR-only document can be
formatted in HTML as:

12/31/2011

so that it will be reordered contextually as "2011/31/12" depending on
the contextual direction of the inline text before it (or after it if
there's no strong direction set by previous inline content within the
same containment block or embedded span, and remaining in weak
direction if there's no strong context at all in that block or span,
so that the block or span will itself inherit the direction from a
lower contextual embedding level, or from the default direction set by
the document language if there are no more context).

For this reason, I am convinced that, in absence of such embedding,
the CS characters should always limit their context to only their
immediate neighboring characters, so that "12/31/2011" will always
keep the same direction of fields in all contexts, in absence of such
markup by an external stylesheet, or Bidi controls, and that "12/31+2"
will NEVER be reordered contextually (preserving the mathematical
meanings of operators by which operands cannot freely commute)

Note that "/" should also not be mirrored into "\" by default, at
least not in mathematical formulas, even if there may exist unformal
uses where it may happen. As this is a discretionary case, it's
certainly better for this case to use the correct "\" character in
texts where needed: changing this default should only be used with
specific stylesheets or when a specific, not recommanded, OpenType
feature has been explicitly enabled (same remark about the featural
change of natural digits into "national" forms, and about the related
controls found in the UCS, which can turn the AN class to the EN
class, or the reverse, now deprecated as well as they don't work
gracefully with the current UBA which ignores those mirroring
controls).
Received on Sun Sep 18 2011 - 22:47:17 CDT

This archive was generated by hypermail 2.2.0 : Sun Sep 18 2011 - 22:47:18 CDT