Re: Directionality Standard

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Fri Jan 11 2008 - 04:12:02 CST

  • Next message: Khaled Hosny: "Re: chairless hamza (summary)"

    Hello Waleed Oransa,

    you have written:
    > What we need is a standard way to encode the directionality
    > of the text that all Unicode-compliant component vendors respect.
    ...
    > That [it's still up to the authors to use BiDi embedding controls
    > when and where needed] is the reason why it does not solve the
    > problem because it's optional!!

    With due respect, I deem your contribution inconsistent:
    You are asking for a standard way to encode the directionality
    (which certainly must be specified by the author), and then
    you tell us that this would not solve the problem, because
    the author has to explicitely specify the directionality.

    > If you think in this as Arabic speaker who want to write English,
    > will you be happy to add such marks by your self each time

    As explained earlier in this thread, you do not have
    to insert such marks “each time”: They are only needed
    in two exceptional cases:
    - if a paragraph starts with an insertion of opposite
       directionality,
    - or when punctuation marks between runs of different
       directionalities belong to the inserted (rather than
       the surrounding) string.

    Whenever a paragraph starts with a string in the paragraph’s
    basic directionality, and contains an insertion in opposite
    directionality, the bidi algorithm will it render as intented.
    This will account for the vast majority of cases.

    Only the exceptional cases, as outlined above, will require
    an additional RLM, or LRM, respectively. So the burden on the
    authors is not unreasonably high.

    Of course, every author of bidirectional text should pay
    attention on directionality issues, and learn to unequivocally
    express his intents and desires.

    > or should the tools supports retaining of your original text
    > direction automatic.

    Of course, a well-designed editor could help with this issue:
    I can imagine an editor having a global setting, “preferred
    writing direction” say, that will automatically supply the
    RLM (or LRM) in the beginning of an “exeptional” paragraph
    starting with an insertion in the opposite direction. However,
    there is nothing an editor could do with the other exception:
    It is the authors task to decide, which punctuation marks
    belong to the insertion, and which to its sorrounding.

    > Even inserting RLE is not available in Web based application.

    In HTML (a higher-level protocol), you would use the DIR
    attribute on the BODY element (specifying the base directionality
    of the whole document), or on any subordinate element (for a
    part of the document), as desired.
    Cf. <http://www.w3.org/TR/html401/struct/dirlang.html#h-8.2>.

    Alternatively, you can insert any Unicode character (including
    format controls) via a numerical character reference. However,
    in HTML sources, the DIR attribute is recommended rather than
    inserting directionality-related format control characters.

    I may have misunderstood your remark: If it is referring to
    data entry into a particular WWW-based form you are using,
    you are subject to the whim of its author, of course. At
    the very least, the form must accept Unicode data, and not
    filter out the format controls. In this case, starting your
    basically right-to-left data with an Arabic or Hebrew character
    should mend the situation (as discussed above). In the exceptional
    cases discussed above, you may be able to enter a RLM either
    via a suitable keyboard, via cut-and-paste or other systems
    utilities, depending on the software you are using.

    > of course this is a tool problem from your point of view but
    > I would say that is because no clear specification in Unicode
    > text and no clear guidelines regarding respect the directionality
    > as very important attribute of the text SAME as any Arabic or
    > Hebrew letter!

    Have you bothered to read the Unicode standard, particularly
    §16.2 <http://www.unicode.org/versions/Unicode5.0.0/ch16.pdf#G16327>
    and UAX #9 <http://www.unicode.org/reports/tr9/tr9-17.html>?

    If you find any omissions therein (or in the Implementation Guidelines
    <http://www.unicode.org/versions/Unicode5.0.0/ch05.pdf#G34785>),
    then go ahead an file your suggestion for improvement, at
    <http://www.unicode.org/reporting.html>. But be specific!
    Your rather vague accusations will not help towards improving
    the standard.

    Best wishes,
       Otto Stolz



    This archive was generated by hypermail 2.1.5 : Fri Jan 11 2008 - 04:15:38 CST