Re: bidi in unipad

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Feb 12 2003 - 02:16:55 EST

  • Next message: Doug Ewell: "Re: Converting EBCDIC to Unicode"

    Chris Jacobs <c dot t dot m dot jacobs at hccnet dot nl> wrote:

    > I did not mean to ask to display the ASCII sequence "\u05e0" as
    > "05e0u\"
    > I meant something else.

    I THINK what you meant is that the ASCII sequence "\u0530", surrounded
    on both sides by real RTL characters, should appear as one continuous
    RTL string, instead of breaking the real RTL characters into two
    separate strings. The cursor would move LTR through the ASCII
    characters \ u 0 5 3 0 but RTL overall through the string.

    This isn't how the bidirectional algorithm works with ASCII characters,
    though. Each Unicode character has a directionality property. Some are
    strong LTR or RTL, some are weak LTR or RTL, and some are neutral --
    their directionality is completely determined by the characters around
    them. (Sort of like the politics of some people I know.)

    ASCII characters are strong LTR, which means they will break up an RTL
    sequence in the manner you are seeing. This is true even if the ASCII
    characters combine to form a commonly understood notation representing
    some other Unicode character. The bidirectional algorithm doesn't do
    any form of semantic analysis on the text. To do so would constitute a
    customized, or I guess the word now is "tailored," version of the
    bidirectional algorithm, which might be great for some purposes such as
    the one you describe, but which wouldn't be Unicode-conformant. UniPad
    is simply being Unicode-conformant in this regard.

    -Doug Ewell
     Fullerton, California



    This archive was generated by hypermail 2.1.5 : Wed Feb 12 2003 - 03:00:31 EST