Re: RTL PUA?

From: Asmus Freytag <asmusf_at_ix.netcom.com>
Date: Sun, 21 Aug 2011 23:08:21 -0700

On 8/21/2011 7:34 PM, Doug Ewell wrote:
> So what you are asking about is a directional control character that would assign subsequent characters a BC of 'AL', right?
>
> You don't want to call this a LANGUAGE MARK or anything else that implies language identification, because of the existence of "real" language identification mechanisms and the history of Unicode and language tagging.

An ARM (Arabic RTL Mark) would be a sensible addition to the standard.
It would close a small gap in design that currently prevents a fully
faithful plain text export of bidi text from rich text (higher level
protocol) formats.

In a HLP you can assign any run to behave as if it was following a
character with bidi property AL.

When you export this text as plain text, unless there is an actual AL
character, you cannot get the same behavior (other than by the
heavy-handed method of completely overriding the directionality, making
your plain text less editable).

So, yes, there's a bit of a use case for such a mark.

(It's effect is limited to treatment of numeric expressions, so it's not
an "Arabic language" mark, but one that triggers the same bidi context
as the presence of an Arabic Script (AL) character.)

A./
>
> --
> Doug Ewell • doug_at_ewellic.org
> Sent via BlackBerry by AT&T
>
> -----Original Message-----
> From: Richard Wordingham<richard.wordingham_at_ntlworld.com>
> Sender: unicode-bounce_at_unicode.org
> Date: Mon, 22 Aug 2011 03:19:39
> To: Unicode Mailing List<unicode_at_unicode.org>
> Subject: Re: RTL PUA?
>
> On Sun, 21 Aug 2011 23:55:46 +0000
> "Doug Ewell"<doug_at_ewellic.org> wrote:
>
>> What's a LANGUAGE MARK?
> There are *three* strong directionalities - 'L' left-to-right, 'AL'
> right-to-left as in Arabic, 'R' right-to-left (as in Hebrew, I
> suspect). 'AL' and 'R' have different effects on certain characters
> next to digits - it's the mind-numbing part of the BiDi algorithm.
> With one a $ sign after a string of European (or is it Arabic?) digits
> appears on the left and in the other it appears on the right. I
> can't remember whether 'higher-level protocols' have an effect on this
> logic. LRM has a BC of L, RLM has a BC of R, but no invisible character
> has a BC of AL. That's why I tentatively raised the notion of ARABIC
> LANGUAGE MARK. Incidentally, an RLO gives characters with a
> temporary BC of R, not AL.
>
> Richard.
>
>
>
>
Received on Mon Aug 22 2011 - 01:11:23 CDT

This archive was generated by hypermail 2.2.0 : Mon Aug 22 2011 - 01:11:24 CDT