Re: basic-hebrew RtL-space ?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Nov 01 2004 - 15:16:31 CST

  • Next message: Doug Ewell: "Re: basic-hebrew RtL-space ?"

    From: "kefas" <pmr@informatik.uni-frankfurt.de>
    > Inserting unicode/basic-hebrew reults in a convinient
    > RtL, right-to-left, advance of the cursor, but the
    > space-character jumps to the far right. Is there a
    > RtL-space?
    > In MS-Word and OpenOffice I can only change whole
    > paragraphs to RtL-entry. But quoting just a few
    > words in hebrew WITHIN a paragraph would be helpful to
    > many.

    And this is what the embedding controls are made for:
    - surround an RTL subtext (Hebrew, Arabic...) within LTR paragraphs
    (Latin...) with a RLE/PDF pair.
    - souround an LTR subtext (Latin, ...) within RTL paragraphs (Hebrew, ...)
    with a LRE/PDF pair.

    There's no need of a separate RTL space, given that the regular ASCII SPACE
    (U+0020) character is used within all RTL texts as the standard default word
    separator, and it inherits it has a weak directionality, that does not force
    a direction break, but that his inherited from the surrounding text.

    A good question however is whever the space should inherit its direction
    from the previous ctext or the next one.
    - If the previous text has a strong directionality, then the space should
    inherit its direction. This should be the case everytime you are entering
    text with a space at end: it's very disturbing to see this new space shift
    on the opposite side, when entering some space-sparated hebrew words within
    a Latin text, because the editor assumes that no more Hebrew will be added
    on the same line (this causes surprizing editing errors, for example when
    creating a translation resource file where translated resources are prefixed
    by an ASCII key, for example when editing a .po file for GNU programs using
    gettext()).
    - If the previous text in the same paragraph has no directionality, then it
    inherits its direction from the text after it (if it has a strong
    directionality);
    - if this does not work then a global context for the whole text should be
    used, or alternatively the directionality of the end of the previous
    paragraph (this influences where the cursor would go to align such
    weakly-directed paragraph with the previous paragraph, including the default
    start margin position.)

    The regular Bidi algorithm should be used to render a complete text, but
    strict Bidi rules should not be obeyed everytime when composing a text,
    where the current cursor position should act as a sentence break with a
    strong inherited directionality: the text can then be redirected at this
    position when the cursor moves to other parts of the text.

    I don't think this is an issue of renderers but of editors (notably in
    Notepad, where you won't know exactly where to enter a space during edition,
    unless you use the contextual menu that allows switching the global default
    directionality, and swap the alignment to the side margins; sometimes, when
    you want to know where there are REL/RLE and PDF Bidi controls, it's nearly
    impossible to determine it vizually in Notepad, unless you use an external
    tool such as native2ascii, from the Java SDK, to change the encoding with
    clearly visible marks). It's unfortunate, given that Notepad (since Windows
    XP) offers you a directly accessible contextual menu to enter Bidi controls
    and change the global direction and alignment to side margins. (But notepad
    has a "visible controls" editing mode, to solve such ambiguities.)

    > Related: The other Hebrew characters in the alphabetic
    > presentation forms insert themselves in LtR-fashion?
    > Why this difference?
    > I read about Logical and Visual entry, but don't see
    > how that answers my 2 questions above.

    Visual entry should never be used. It was used for some legacy encodings to
    render text on devices that don't implement the Bidi algorithm and can only
    render text as LTR. Nobody enters RTL text in "pseudo-visual" LTR order;
    only the logical input order is needed.

    But don't mix the input order and the encoding order as they can be
    different (it should not if the text is converted and stored in Unicode,
    where only the logical order is legal for any mix of Latin, Greek, Cyrillic,
    and Hebrew, Arabic).

    The case for Thai is different because its input order is (historically)
    visual rather than logical, and then the text is encoded using the same
    (visual) order. This is not changed with Thai in Unicode, to keep its
    compatibility with the national Thai standard TIS-620 (and further
    revizions). So even though Thai uses an non-logical order, its input order
    and encoding order is the same.

    The difference of encoding orders is known mainly for historic texts created
    for modern Hebrew, and more rarely Arabic, or for texts encoded in a private
    pre-press encoding used to prepare the global layout of pages (these texts
    are more easily and fast processed in complex page layouts if they are
    prepared in visual order before flowing them in the page layout template;
    such applications use specific encodings in a richer rendering context than
    just plain text, so this is out of scope of the Unicode standard itself).



    This archive was generated by hypermail 2.1.5 : Mon Nov 01 2004 - 15:26:34 CST