Re: RTL PUA? from Richard Wordingham on 2011-08-21 (Unicode Mail List Archive)

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Sun, 21 Aug 2011 16:48:35 +0100

On Sun, 21 Aug 2011 01:44:02 +0000
"Doug Ewell" <doug_at_ewellic.org> wrote:

>> The more I think of it, the more I like the idea of reassigning the
>> default BC of Plane 16 to 'R'. What would the arguments against this
>> be?

>> BC of 'AL'?

> Would that really be a better default? I thought the main RTL needs
> for the PUA would be for unencoded scripts, not for even more Arabic
> letters. (How many more are there anyway?)

Not necessarily better, I'm just suggesting that both need to be
supported. However, we need to look at use cases.

(1) Unencoded Arabic script letters with joining behaviour, for use with
any application.

(a) We need the character to have AL, R or ON for it to be included in
BiDi runs. If we use ON we may need RLM when the character is at the
edge of a run, and even then, its behaviour may be no better than a
character with a BC of R.

(b) It may get left out of script runs. There were problems on
Windows with the Tamil ligature k.SS not rendering, despite font
support, when the character U+0BB7 TAMIL LETTER SSA was new. And
that's in a left-to right script with a character in the appropriate
block!

(2) Complete right-to-left script. I'm presuming the difference
between AL and R is then a matter of what right-to-left script the
potential users chiefly also use.

(a) As a practical implementation, the distinction between AL and R
would matter if the script has modern use. Otherwise, any of ON, AL
and R would do, though one might face the annoyance of having to start
chunks of text with RLM. If a script with modern use should be encoded
using a BC of R, then I believe ON would also do as a stop-gap until
the script is encoded.

How fiendish is BiDi-sensitive transliteration?

(b) For experimentation, I believe the difference between AL, R and ON
would matter little, even though it would be irritiating to have to
use RLM.

(c) Complex script support is patchy - one might be restricted to
applications that allow the font to provide full complex script support.

The big issue in all this, though, is (i) how to update the rendering
system with a new set of values for Unicode properties, including
script, and (ii) the scope of such an update. (The distinction between
the PUA and the rest is that it makes sense for PUA properties to
change as freely as fonts.) This, incidentally, is analogous to locales
reflecting code page selections. There is also, though less pressing,
the issue of tailoring collations. (The worst issue is there is
distinct canonically inequivalent characters of type Lo comparing equal
- I've seen it for Canadian Aboriginal Syllabics for Windows XP and for
Thai in Ubuntu 10.04 - surely that's not the normal British collation
of such characters.)

One minor problem with (i) *was* that it wasn't clear how one should
annotate a copy of UnicodeData.txt to show that it has been modified.
The standard XML alternative provides allows for comments, thereby
solving that problem.

If Issue (i) can be readily solved at the machine or user level or
lower, then the default properties of the PUA become irrelevant.

Richard.
Received on Sun Aug 21 2011 - 10:52:00 CDT

This archive was generated by hypermail 2.2.0 : Sun Aug 21 2011 - 10:52:00 CDT