From: Mark Davis (mark.davis@icu-project.org)
Date: Mon Dec 17 2007 - 20:45:21 CST
The default behavior is defined in UAX#9. However, there are circumstances
in which this needs to be overridden. This is described in that document.
Mark
On Dec 17, 2007 6:33 PM, Behnam <behnam.rassi@gmail.com> wrote:
> Thank you very much Mark.Should I conclude that in any case the rtl
> paragraph is not reproduced the device or application is not Unicode
> compliant?
> This amounts to many!
> And why do we need language tag in HTML or other formats if this property
> is defined within the encoded paragraph?
> -B
>
> On 17-Dec-07, at 8:58 PM, Mark Davis wrote:
>
> There may be some misunderstanding. Unicode does define the default
> direction of a paragraph for use with the bidi algorithm (which determines
> the ordering of characters containing bidirectional scripts like Arabic or
> Hebrew).
>
> See http://unicode.org/reports/tr9/
>
> Mark
>
> On Dec 17, 2007 4:23 PM, Behnam <behnam.rassi@gmail.com > wrote:
>
> > Thank you.So the answer is no. Unicode does not define the
> > directionality of a paragraph. Then I guess my next question should be why?
> > I think I have some explaining to do.
> > Unicode defines a very complex bidi behaviour of characters, and it
> > defines the beginning and ending of a paragraph (I assume). Yet, it doesn't
> > define what directionality this paragraph should take to arrange these
> > characters within the paragraph.
> > Defining the directionality of a paragraph is more important than
> > defining the language of a text. Yes, language tag can help language aware
> > devices and applications behave accordingly. But directionality definition
> > is not about ' user friendly' behaviour of a text, it is about reproducing
> > the raw text, as intended by its Unicode encoding.
> > Understanding this issue I suppose, may be very easy or very difficult,
> > depending on to the extend you were exposed to rtl experience. In the next
> > paragraph, I write a Persian line, throwing a couple of English words
> > within, and in left to right directionality to give you an idea about what
> > right to left users are experiencing in everyday basis.
> > پرسش من از Unicode این است که چرا برای پاراگراف directionality تبیین
> > نکرده است.
> > In order to read the above phrase correctly in Persian, the order of
> > words should be as I numbered below (from right to left):
> > پرسش1 من2 از3 Unicode4 این5 است6 که7 چرا8 برای9 پاراگراف10
> > directionality11 تبیین12 نکرده13 است14.
> >
> > Of-course I can set this paragraph in my application to "rtl" and thanks
> > to wonders of bidi behaviour of characters, everything will be put in place:
> >
> >
> > پرسش من از Unicode این است که چرا برای پاراگراف directionality تبیین
> > نکرده است.
> >
> > But I have absolutely no guarantee that my rtl text in an email, in a
> > text message, in an online forum posting... will be received in rtl setting.
> > This perfectly Unicode encoded text is at the mercy of applications,
> > devices, mediums and platforms. And more likely than not, my rtl paragraph
> > will be received in ltr and in the order that I numbered above! Even in a
> > more controlled situations such as word processors, as a friend of mine has
> > experienced, this Persian phrase written in rtl setting of Nisus on a Mac,
> > exported in a .doc format, and opened on a Windows platform will produce an
> > rtl, but 'Arabic' document! not only an Arabic script document which is, but
> > an Arabic language document!
> >
> > You can experiment this dilemma yourself. Set your application to rtl
> > (which can be done in many applications), write something in English or any
> > Roman language. As long as the whole phrase is Roman, you only get a
> > misplaced final period in far left. But if you throw a couple of Hebrew
> > words within the phrase, then you'll see what a wrong directionality setting
> > can do to your English. Of-course you are not exposed to this dilemma
> > because the default directionality of all computerized devices and
> > applications is left to right. But it gives you an idea what rtl users are
> > going through in everyday basis.
> >
> > Again, this is not about requesting a convenience. It is about
> > requesting Unicode to do what it is set to do. Unicode encodes bidi
> > behaviour of characters, the beginning of a paragraph, the end of a
> > paragraph. It must encode its directionality too.
> >
> > Behnam
> >
> >
> > On 17-Dec-07, at 4:20 AM, Stephane Bortzmeyer wrote:
> >
> > On Sat, Dec 15, 2007 at 11:08:40AM -0500,
> > Behnam <behnam.rassi@gmail.com> wrote
> > a message of 78 lines which said:
> >
> > Is there any Unicode standard to identify a text? i.e. primary
> > script>directionality>language?
> >
> >
> > Not an Unicode standard but, yes, there is a standard to tag texts to
> > indicate language, script, etc. It's RFC 4646. See
> > http://www.langtag.net/ for a start.
> >
> >
> >
>
>
> --
> Mark
>
>
>
-- Mark
This archive was generated by hypermail 2.1.5 : Mon Dec 17 2007 - 20:47:18 CST