Re: Directionality Standard

From: Asmus Freytag (
Date: Thu Dec 20 2007 - 02:31:04 CST

  • Next message: William J Poser: "Re: CLDR Usage of Gregorian Calendar Era Terms: BC and AD -- Can we please have "CE" and "BCE" ?"


    adding a LRM or RLM at the head of the paragraph allows the Unicode text
    itself to carry an indication of the desired top-level directionality.
    That indication will be picked up by any implementation of the *default*
    algorithm (but is easily overridden by any external markup in protocols
    that support it.). The way it works, is that the mark counts as a letter
    with strong directionality, in this case the first strong letter used
    for setting the top-level directionality, while being otherwise
    invisible in the display.


    On 12/19/2007 3:02 PM, Kent Karlsson wrote:
    > Stephane Bortzmeyer wrote:
    >>> Can't a Hebrew site have a news in Hebrew, with a long quotation of
    >>> the speech of an American politician in English in an ltr paragraph?
    >> Yes, and Unicode handles it fine, in plain text, without the need for
    >> support from a markup language (because each Unicode character has a
    >> direction).
    > No, that's not the issue. The display of a line of bidi text (with
    > actual mix of directions) becomes completely different depending on
    > the top level paragraph direction. That is NOT derived from "each
    > Unicode character has a direction" (considering just those that
    > have strong directionality).
    > The initial poster in this thread gave a good example. But here is a
    > simpler one, using the convention that uppercase denotes RTL letters.
    > The *same* input text, logical order "ABCdefGHI", gets the display
    > CBAdefIHG if the top level direction is LTR (a.k.a. level 0)
    > IHGdefCBA if the top level direction is RTL (a.k.a. level 1)
    > The top level paragraph direction is not inherent in the text (and
    > *cannot* be), though the bidi algorithm specifies a default, but just
    > a default, usually overriden by markup (or language tag) when markup
    > (or language tag) is available, since the default is not stable for
    > editing (unless the editor forces the use of a LRM or RLM char at the
    > beginning of each paragraph).
    > /kent k

    This archive was generated by hypermail 2.1.5 : Thu Dec 20 2007 - 02:33:04 CST