Re: Directionality Standard

From: Behnam (behnam.rassi@gmail.com)
Date: Mon Dec 17 2007 - 20:33:38 CST

  • Next message: Mark Davis: "Re: Directionality Standard"

    Thank you very much Mark.
    Should I conclude that in any case the rtl paragraph is not
    reproduced the device or application is not Unicode compliant?
    This amounts to many!
    And why do we need language tag in HTML or other formats if this
    property is defined within the encoded paragraph?
    -B

    On 17-Dec-07, at 8:58 PM, Mark Davis wrote:

    > There may be some misunderstanding. Unicode does define the default
    > direction of a paragraph for use with the bidi algorithm (which
    > determines the ordering of characters containing bidirectional
    > scripts like Arabic or Hebrew).
    >
    > See http://unicode.org/reports/tr9/
    >
    > Mark
    >
    > On Dec 17, 2007 4:23 PM, Behnam <behnam.rassi@gmail.com > wrote:
    > Thank you.
    > So the answer is no. Unicode does not define the directionality of
    > a paragraph. Then I guess my next question should be why?
    > I think I have some explaining to do.
    > Unicode defines a very complex bidi behaviour of characters, and it
    > defines the beginning and ending of a paragraph (I assume). Yet, it
    > doesn't define what directionality this paragraph should take to
    > arrange these characters within the paragraph.
    > Defining the directionality of a paragraph is more important than
    > defining the language of a text. Yes, language tag can help
    > language aware devices and applications behave accordingly. But
    > directionality definition is not about ' user friendly' behaviour
    > of a text, it is about reproducing the raw text, as intended by its
    > Unicode encoding.
    > Understanding this issue I suppose, may be very easy or very
    > difficult, depending on to the extend you were exposed to rtl
    > experience. In the next paragraph, I write a Persian line, throwing
    > a couple of English words within, and in left to right
    > directionality to give you an idea about what right to left users
    > are experiencing in everyday basis.
    > پرسش من از Unicode این است که چرا برای
    > پاراگراف directionality تبیین نکرده است.
    > In order to read the above phrase correctly in Persian, the order
    > of words should be as I numbered below (from right to left):
    > پرسش1 من2 از3 Unicode4 این5 است6 که7 چرا8
    > برای9 پاراگراف10 directionality11 تبیین12
    > نکرده13 است14.
    >
    > Of-course I can set this paragraph in my application to "rtl" and
    > thanks to wonders of bidi behaviour of characters, everything will
    > be put in place:
    >
    > پرسش من از Unicode این است که چرا برای
    > پاراگراف directionality تبیین نکرده است.
    >
    > But I have absolutely no guarantee that my rtl text in an email, in
    > a text message, in an online forum posting... will be received in
    > rtl setting. This perfectly Unicode encoded text is at the mercy of
    > applications, devices, mediums and platforms. And more likely than
    > not, my rtl paragraph will be received in ltr and in the order that
    > I numbered above! Even in a more controlled situations such as word
    > processors, as a friend of mine has experienced, this Persian
    > phrase written in rtl setting of Nisus on a Mac, exported in a .doc
    > format, and opened on a Windows platform will produce an rtl, but
    > 'Arabic' document! not only an Arabic script document which is, but
    > an Arabic language document!
    >
    > You can experiment this dilemma yourself. Set your application to
    > rtl (which can be done in many applications), write something in
    > English or any Roman language. As long as the whole phrase is
    > Roman, you only get a misplaced final period in far left. But if
    > you throw a couple of Hebrew words within the phrase, then you'll
    > see what a wrong directionality setting can do to your English. Of-
    > course you are not exposed to this dilemma because the default
    > directionality of all computerized devices and applications is left
    > to right. But it gives you an idea what rtl users are going through
    > in everyday basis.
    >
    > Again, this is not about requesting a convenience. It is about
    > requesting Unicode to do what it is set to do. Unicode encodes bidi
    > behaviour of characters, the beginning of a paragraph, the end of a
    > paragraph. It must encode its directionality too.
    >
    > Behnam
    >
    >
    > On 17-Dec-07, at 4:20 AM, Stephane Bortzmeyer wrote:
    >
    >> On Sat, Dec 15, 2007 at 11:08:40AM -0500,
    >> Behnam <behnam.rassi@gmail.com> wrote
    >> a message of 78 lines which said:
    >>
    >>> Is there any Unicode standard to identify a text? i.e. primary
    >>> script>directionality>language?
    >>
    >> Not an Unicode standard but, yes, there is a standard to tag texts to
    >> indicate language, script, etc. It's RFC 4646. See
    >> http://www.langtag.net/ for a start.
    >
    >
    >
    >
    > --
    > Mark



    This archive was generated by hypermail 2.1.5 : Mon Dec 17 2007 - 20:35:53 CST