Re: [bidi] Bidi demo

From: Mark Davis (mark@macchiato.com)
Date: Wed Apr 29 2009 - 19:30:57 CDT

  • Next message: John Burger: "Entropic Evidence for Linguistic Structure in the Indus Script"

    I made some of those fixes, so let me know if there are further problems.

    The ASCII mapping I am using has the normal BIDI class values, with the
    following overrides.

        asciiHackMap.put(']', LRE);
        asciiHackMap.put('[', RLE);
        asciiHackMap.put('}', LRO);
        asciiHackMap.put('{', RLO);
        asciiHackMap.put('|', PDF);
        asciiHackMap.putAll(new UnicodeSet("[A-M]"), R);
        asciiHackMap.putAll(new UnicodeSet("[N-Z]"), AL);
        asciiHackMap.putAll(new UnicodeSet("[5-9]"), AN);
        asciiHackMap.put('>', L);
        asciiHackMap.put('<',R);
        asciiHackMap.put('"',NSM);
        asciiHackMap.put('_',BN);

    I have not tried reconciling those with Asmus's values, which appear to be:

    int TypesFromChar[] =
    {
    //0 1 2 3 4 5 6 7 8 9 a b c d e f
     ON, ON, ON, ON, L, R, ON, ON, ON, ON, ON, ON, ON, B, RLO,RLE, /*00-0f*/
     LRO,LRE,PDF,WS, ON, ON, ON, ON, ON, ON, ON, ON, ON, ON, ON, ON, /*10-1f*/

     WS, ON, ON, ON, ET, ON, ON, ON, ON, ON, ON, ET, CS, ON, ES, ES, /*20-2f*/
     EN, EN, EN, EN, EN, EN, AN, AN, AN, AN, CS, ON, ON, ON, ON, ON, /*30-3f*/
      R, AL, AL, AL, AL, AL, AL, R, R, R, R, R, R, R, R, R, /*40-4f*/
      R, R, R, R, R, R, R, R, R, R, R, ON, B, ON, ON, ON, /*50-5f*/
     NSM, L, L, L, L, L, L, L, L, L, L, L, L, L, L, L, /*60-6f*/
      L, L, L, L, L, L, L, L, L, L, L, ON, S, ON, ON, ON, /*70-7f*/
    };

    http://www.unicode.org/reports/tr9/BidiReferenceCpp/bidi.c.txt

    Mark

    On Tue, Apr 28, 2009 at 20:40, Mark Davis <mark@macchiato.com> wrote:

    >
    > On Tue, Apr 28, 2009 at 06:28, Matitiahu Allouche <matial@il.ibm.com>wrote:
    >
    >>
    >> Hello, Mark!
    >>
    >> This demo is useful, and quite nicely done. A few remarks.
    >
    >
    > Thanks, and thanks for the comments.
    >
    >
    >>
    >> 1) By default, base level 1 is assumed. A check box (LTR paragraph)
    >> allows forcing the base level to 0.
    >> The default behavior is not quite conformant to the UBA (rule P2). I
    >> suggest to replace the check box by 3 radio buttons for UBA default, forced
    >> LTR and forced RTL respectively.
    >
    >
    > I agree. I did pretty much throw it together, so I didn't expose all three
    > choices, but I can make it either a pull-down or radio buttons.
    >
    >
    >>
    >> 2) The checkbox for "ASCII Hack" may not be understood by casual Bidi
    >> overseekers. The section added at the end of the page when checking the box
    >> can easily fall beyond the current screenful so that the user will not even
    >> be aware that something has happened.
    >> I suggest to add a short explanation close to the checkbox and a reference
    >> to the added section.
    >
    >
    > Agreed. What I really need to do is supply much more of a description.
    >
    >>
    >>
    >> 3) The characters in your ASCII hacking table are different from those
    >> chosen by Asmus Freytag in his Bidi Tool (part of the Unibook application),
    >> for no benefit that I can see. I suggest to align your table with Asmus's,
    >> if for no other reason than that he was the first, so that we veteran Bidi
    >> dabblers are used to it.
    >
    >
    > I basically just went with the characters that are in
    > http://unicode.org/reports/tr9/BidiReferenceJava/BidiReferenceTestCharmap.java.txt,
    > plus adding others so as to cover all the classes. I can definitely change
    > those, although if the differ across versions of reference code we'll want
    > to fix it. (For others, this is not an intrinsic part of the algorithm, just
    > for testing.) Where are the Unibook ones listed?
    >
    >
    >>
    >>
    >> 4) The ASCII Hack characters used for ES, ET and CS should be chosen among
    >> characters which really have this classification in the latest versions of
    >> Unicode. Putting Plus and Hyphen-Minus signs in the ET class sets us back
    >> to Unicode 3.x and might reopen an old quarrel with Microsoft (joking :-).
    >> Also, Solidus is really CS and is a bad representative for ES.
    >>
    >> 5) The 001C-001E characters in the B class are rendered as square blocks
    >> in my browser (and probably anybody else's). Since they are not easily
    >> generated from a keyboard, I suggest to just remove them.
    >>
    >> 6) 000C is really WS and is not a good representative for the B class.
    >> The other representatives of this class are not printable. I suggest to
    >> add names and/or hex codes in a comment column.
    >>
    >> 7) All the characters in the S class are not good choices, being either
    >> not easily generated from the keyboard (000B, 001F) or being intercepted by
    >> the browser (0009). I suggest to remove those and add some printable ASCII
    >> character.
    >>
    >> 8) Same thing for the WS class: I suggest to add name and/or hex code in
    >> a comment column.
    >>
    >> 9) Your ASCII Hack table has no representatives for LRM and RLM. I
    >> suggest to use @ for LRM and & for RLM.
    >
    >
    > I used > and <.
    >
    >
    >>
    >>
    >> 10) The string "abc\nde" (keying Enter between "abc" and "de") causes a
    >> server internal error when pressing the "Show Bidi" button.
    >
    >
    > Ah, yes, I didn't check for multiple lines; I'll fix that.
    >
    >
    >>
    >>
    >>
    >> Shalom (Regards), Mati
    >> Bidi Architect
    >> Globalization Center Of Competency - Bidirectional Scripts
    >> IBM Israel
    >> Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile: +972 52
    >> 2554160
    >>
    >>
    >>
    >> *Mark Davis <mark@macchiato.com>*
    >> Sent by: bidi-bounce@unicode.org
    >>
    >> 28/04/2009 03:29
    >> To
    >> "bidi@unicode.org" <bidi@unicode.org>
    >> cc
    >> Unicode <unicode@unicode.org> Subject
    >> [bidi] Bidi demo
    >>
    >>
    >>
    >>
    >> I posted a bidi demo at *http://unicode.org/cldr/utility/bidi.jsp*>
    >>
    >> For a given sample string, it shows the results of applying the bidi
    >> algorithm *and* the rules responsible for each character's resulting
    >> level. (The UI isn't polished; I threw it together using off-the-shelf
    >> components, and some small modifications to the UBA reference code to
    >> capture the rules.) The default sample is chosen to invoke most of the
    >> rules. Comments are welcome.
    >>
    >> Mark
    >>
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Apr 29 2009 - 19:35:47 CDT