Re: Directionality Standard

From: Mark Davis (mark.davis@icu-project.org)
Date: Mon Dec 17 2007 - 20:45:21 CST

  • Next message: Jony Rosenne: "RE: Directionality Standard"

    The default behavior is defined in UAX#9. However, there are circumstances
    in which this needs to be overridden. This is described in that document.

    Mark

    On Dec 17, 2007 6:33 PM, Behnam <behnam.rassi@gmail.com> wrote:

    > Thank you very much Mark.Should I conclude that in any case the rtl
    > paragraph is not reproduced the device or application is not Unicode
    > compliant?
    > This amounts to many!
    > And why do we need language tag in HTML or other formats if this property
    > is defined within the encoded paragraph?
    > -B
    >
    > On 17-Dec-07, at 8:58 PM, Mark Davis wrote:
    >
    > There may be some misunderstanding. Unicode does define the default
    > direction of a paragraph for use with the bidi algorithm (which determines
    > the ordering of characters containing bidirectional scripts like Arabic or
    > Hebrew).
    >
    > See http://unicode.org/reports/tr9/
    >
    > Mark
    >
    > On Dec 17, 2007 4:23 PM, Behnam <behnam.rassi@gmail.com > wrote:
    >
    > > Thank you.So the answer is no. Unicode does not define the
    > > directionality of a paragraph. Then I guess my next question should be why?
    > > I think I have some explaining to do.
    > > Unicode defines a very complex bidi behaviour of characters, and it
    > > defines the beginning and ending of a paragraph (I assume). Yet, it doesn't
    > > define what directionality this paragraph should take to arrange these
    > > characters within the paragraph.
    > > Defining the directionality of a paragraph is more important than
    > > defining the language of a text. Yes, language tag can help language aware
    > > devices and applications behave accordingly. But directionality definition
    > > is not about ' user friendly' behaviour of a text, it is about reproducing
    > > the raw text, as intended by its Unicode encoding.
    > > Understanding this issue I suppose, may be very easy or very difficult,
    > > depending on to the extend you were exposed to rtl experience. In the next
    > > paragraph, I write a Persian line, throwing a couple of English words
    > > within, and in left to right directionality to give you an idea about what
    > > right to left users are experiencing in everyday basis.
    > > پرسش من از Unicode این است که چرا برای پاراگراف directionality تبیین
    > > نکرده است.
    > > In order to read the above phrase correctly in Persian, the order of
    > > words should be as I numbered below (from right to left):
    > > پرسش1 من2 از3 Unicode4 این5 است6 که7 چرا8 برای9 پاراگراف10
    > > directionality11 تبیین12 نکرده13 است14.
    > >
    > > Of-course I can set this paragraph in my application to "rtl" and thanks
    > > to wonders of bidi behaviour of characters, everything will be put in place:
    > >
    > >
    > > پرسش من از Unicode این است که چرا برای پاراگراف directionality تبیین
    > > نکرده است.
    > >
    > > But I have absolutely no guarantee that my rtl text in an email, in a
    > > text message, in an online forum posting... will be received in rtl setting.
    > > This perfectly Unicode encoded text is at the mercy of applications,
    > > devices, mediums and platforms. And more likely than not, my rtl paragraph
    > > will be received in ltr and in the order that I numbered above! Even in a
    > > more controlled situations such as word processors, as a friend of mine has
    > > experienced, this Persian phrase written in rtl setting of Nisus on a Mac,
    > > exported in a .doc format, and opened on a Windows platform will produce an
    > > rtl, but 'Arabic' document! not only an Arabic script document which is, but
    > > an Arabic language document!
    > >
    > > You can experiment this dilemma yourself. Set your application to rtl
    > > (which can be done in many applications), write something in English or any
    > > Roman language. As long as the whole phrase is Roman, you only get a
    > > misplaced final period in far left. But if you throw a couple of Hebrew
    > > words within the phrase, then you'll see what a wrong directionality setting
    > > can do to your English. Of-course you are not exposed to this dilemma
    > > because the default directionality of all computerized devices and
    > > applications is left to right. But it gives you an idea what rtl users are
    > > going through in everyday basis.
    > >
    > > Again, this is not about requesting a convenience. It is about
    > > requesting Unicode to do what it is set to do. Unicode encodes bidi
    > > behaviour of characters, the beginning of a paragraph, the end of a
    > > paragraph. It must encode its directionality too.
    > >
    > > Behnam
    > >
    > >
    > > On 17-Dec-07, at 4:20 AM, Stephane Bortzmeyer wrote:
    > >
    > > On Sat, Dec 15, 2007 at 11:08:40AM -0500,
    > > Behnam <behnam.rassi@gmail.com> wrote
    > > a message of 78 lines which said:
    > >
    > > Is there any Unicode standard to identify a text? i.e. primary
    > > script>directionality>language?
    > >
    > >
    > > Not an Unicode standard but, yes, there is a standard to tag texts to
    > > indicate language, script, etc. It's RFC 4646. See
    > > http://www.langtag.net/ for a start.
    > >
    > >
    > >
    >
    >
    > --
    > Mark
    >
    >
    >

    -- 
    Mark
    


    This archive was generated by hypermail 2.1.5 : Mon Dec 17 2007 - 20:47:18 CST