Re: Jumping Cursor. Was: Right-to-Left Punctuation Problem

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Aug 01 2005 - 20:21:58 CDT

  • Next message: John Hudson: "Re: Jumping Cursor. Was: Right-to-Left Punctuation Problem"

    > I assumed that "inherent" Arabic bidirectionality was
    > invented in the wee hours of computer history, maybe in the early
    > sixties, so it never occurred to me that anybody on this list might take
    > it personally.

    Dear me, unexamined presuppositions can be a problem, can't they? ;)

    Visual order Arabic and Hebrew implementations on computers were
    probably "invented" in the 70's, and saw fairly widespread use
    in that timeframe on mainframes and later in the 80's on PC's. A
    lot of that work was done by IBM. An inherent bidirectionality
    algorithm was invented at Xerox PARC in the 80's, I think, although
    others might have had an earlier hand in it. It was implemented
    on the Xerox Star system in that timeframe. You can see it
    discussed in Joe Becker's 1984 Scientific American article, for
    example. And that was the immediate precursor of Arabic and Hebrew
    support on the Macintosh, as well as the inspiration for the
    Unicode bidirectional algorithm.

    [Some historians on the list can, no doubt, nail this stuff down
    more precisely...]

    > I really do
    > not understand the assertions that e.g. rtl digits would be a big
    > problem, for reasons that I've explained on other messages. Which makes
    > me think there's something I'm overlooking. That's all.

    Yes, you are.

    Cloning *any* common characters -- let alone all the digits, all
    the common punctuation, and SPACE -- on the basis of directionality
    differences, *would* wreak havoc on information processing. Many
    of the characters in question are in ASCII, which means they
    are baked into hundreds of formal languages, thousands of protocols
    and 10's of thousands of programs and software systems. They have
    been for decades now, and that *includes* Arabic and Hebrew
    information processing systems.

    Making the SPACE character in Arabic and Hebrew be something *other*
    than U+0020 SPACE, simply because it might make bidirectional
    editors easier to write if all characters were inherently RTL for
    Arabic, would have the effect of breaking nearly all Arabic
    and Hebrew information processing, deep down in the guts where
    end users can't get at it. The *only* way around it would be to
    introduce such things effectively all pre-deprecated with canonical
    equivalences to the existing characters, so that at least normalized
    data would behave correctly and be interpreted correctly. But then
    there would be no supportable reason for introducing them in
    the first place.

    And you haven't thought through the consequences of having duplicated
    digits with different directionality. You might think an end
    user has complete control over what they do, with their keyboard
    and their choice of characters -- but text is now *global* data,
    and much of what goes on with data is automated, and consists
    of programs talking to programs through protocols. Once you unleash
    different users using what claims to be the *same* character
    encoding, but with opposite conventions about *which* digits they
    use and what direction those flow, you will inevitably get
    into the situation where one process or another cannot reliably
    tell whether "1234" is to be interpreted a 1234 or 4321. That alone
    is enough for the whole proposal to be completely dead in the water.
    All the proposal would accomplish is to create massive ambiguity
    about what the representation of a given piece of Hebrew or
    Arabic text should be -- and that is a *bad* thing in a character
    encoding.

    > Then again, I
    > really do not understand why anybody would think RTL languages are
    > inherently bidi, so maybe there's no point

    Well, first of all, nobody has claimed that the Arabic *language*
    is inherently bidi. Nor has anybody claimed that the Arabic *script*
    is inherently bidi. So try understanding what the people implementing
    these systems *are* claiming.

    Any functional information processing system concerned with
    textual layout that is aimed at the Hebrew or Arabic language
    markets *must* support bidirectional layout of text. That is
    simply a fact.

    Furthermore, to do so interoperably -- that is, with the hope
    that Implementation A by Company X will lay out the same underlying
    text as Implementation B by Company Y in the same order, so that
    a human sees and reads it as the "same" text -- they depend on
    a well-defined encoding of the characters and a well-defined
    bidirectional layout algorithm. One possible choice is consistent
    visual ordering. One possible choice is consistent logical ordering
    and an inherent bidirectional algorithm. The Unicode Standard
    chose the latter, for a number of very good reasons. Trying
    to mix the two is a quick road to hell.

    --Ken

    >
    > -g
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Aug 01 2005 - 20:24:54 CDT