RE: RTL PUA?

From: Doug Ewell <doug_at_ewellic.org>
Date: Fri, 19 Aug 2011 07:13:51 -0700

Petr Tomasek <tomasek at etf dot cuni dot cz> wrote:

> I would like to ask why there are no PUA parts which would be reserved
> for RTL scripts (i.e. would have the directionality set to "strong
> RTL").

The PUA is supposed to be a free and open sandbox, without reserved or
allocated zones. There was supposed to be a Corporate Use Subarea,
starting at U+F8FF and working down, and an End User Subarea, starting
at U+E000 and working up, but even that is not widely honored.

My question would be why the PUA is designated as 'L' by default at all,
instead of, say, 'ON'.

Section 16.5 of TUS 6.0 says:

"The Unicode Character Database provides default character properties,
which implementations can use for the processing of private-use
characters. In addition, users of private-use characters may exchange
external data that allow them also to exchange private-use characters in
a semantically consistent way between implementations. The Unicode
Standard provides no predefined format for such an exchange."

So your private agreement, in addition to specifying the meaning of your
PUA characters and probably some sample glyphs, can also specify their
properties, overriding the default properties. But these lines in
UnicodeData.txt:

E000;<Private Use, First>;Co;0;L;;;;;N;;;;;
F8FF;<Private Use, Last>;Co;0;L;;;;;N;;;;;

do present the impression that these code points are somehow reserved
for strong-LTR characters, and also for non-reordrant characters (i.e.
no combining marks), neither of which is true.

There's a lot of misinformation and FUD about the PUA, and unfortunately
I expect at least one response of the form "The PUA is evil, don't use
it," which accomplishes very little.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­
Received on Fri Aug 19 2011 - 09:16:42 CDT

This archive was generated by hypermail 2.2.0 : Fri Aug 19 2011 - 09:16:54 CDT