Re: Arabic renderer in four lines of Perl

From: Francois Yergeau (yergeau@alis.com)
Date: Sat Jun 27 1998 - 13:16:03 EDT


À 14:14 24-06-98 -0700, Roman Czyborra a écrit :
>Doesn't the algorithm get the
>global direction wrong if my English sentence starts with an Arabic
>word?

Yes.

> Wouldn't it be better to have no heuristics instead of insecure
>heuristics?

But what do you do when you have no heuristics and no higher-level protocol
to tell you base directionality (i.e. in plain text) ?

> Why
>does HTML/RFC2070 have entities for U+200[EF] (‎‏) but not for
>U+202[A-E] (LRE,RLE,PDF,LRO,RLO)?

Because the latter map naturally to the model of elements having content
that is somehow affected by being contained in that particular element,
whereas LRM and RLM do not, having only effect on the immediate
surroundings. LRE et al. are exceptional in Unicode in that they create
statefullness, which is nicely expressed in HTML as element structure. Of
course, it would have been possible to *also* define entities, but this
would have been redundant.

>Does BDO="RTL" really equal RLO?

Yes, by definition. From RFC 2070 4.2.4:

 The effect of BDO is to force the directionality of all characters
 within it to the value of DIR, irrespective of their intrinsic
 directional properties. It is equivalent to using the LEFT-TO-RIGHT
 OVERRIDE (202D) or RIGHT-TO-LEFT OVERRIDE (202E) characters of ISO
 10646, the end tag again being equivalent to the POP DIRECTIONAL
 FORMATTING (202C) character.

Regards,

-- 
François Yergeau <yergeau@alis.com>
Alis Technologies inc., Montréal
Tél : +1 (514) 747-2547
Fax : +1 (514) 747-2561



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT