Re: Private Use areas from Ken Whistler via Unicode on 2018-08-21 (Unicode Mail List Archive)

From: Ken Whistler via Unicode <unicode_at_unicode.org>
Date: Tue, 21 Aug 2018 11:03:41 -0700

On 8/21/2018 7:56 AM, Adam Borowski via Unicode wrote:
> On Mon, Aug 20, 2018 at 05:17:21PM -0700, Ken Whistler via Unicode wrote:
>> On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote:
>>> Is there a block of RTL PUA also?
>> No.
> Perhaps there should be?

This is a periodic suggestion that never goes anywhere--for good reason.
(You can search the email archives and see that it keeps coming up.)

Presuming that this question was asked in good faith...

>
> What about designating a part of the PUA to have a specific property?

The problem with that is that assigning *any* non-default property to
any PUA code point would break existing implementations' assumptions
about PUA character properties and potentially create havoc with
existing use.

> Only certain properties matter enough:

That is an un-demonstrated assertion that I don't think you have thought
through sufficiently.

> * wide
> * RTL

RTL is not some binary counterpart of LTR. There are 23 values of
Bidi_Class, and anyone who wanted to implement a right-to-left script in
PUA might well have to make use of multiple values of Bidi_Class. Also,
there are two major types of strong right-to-leftness: Bidi_Class=R and
Bidi_Class=AL. Should a "RTL PUA" zone favor Arabic type behavior or
non-Arabic type behavior?

> * combining

Also not a binary switch. Canonical_Combining_Class is a numeric value,
and any value but ccc=0 for a PUA character would break normalization.
Then for the General_Category, there are three types of "marks" that
count as combining: gc=Mn, gc=Mc, gc=Me. Which of those would be favored
in any PUA assignment?

> as most others are better represented in the font itself.

Really? Suppose someone wants to implement a bicameral script in PUA.
They would need case mappings for that, and how would those be "better
represented in the font itself"? Or how about digits? Would numeric
values for digits be "better represented in the font itself"? How about
implementation of punctuation? Would segmentation properties and
behavior be "better represented in the font itself"?

>
> This could be done either by parceling one of existing PUA ranges: planes 15
> and 16 are virtually unused thus any damage would be negligible;

That is simply an assertion -- and not the kind of assertion that the
UTC tends to accept on spec. I rather suspect that there are multiple
participants on this email list, for example, who *do* have
implementations making extensive use of Planes 15/16 PUA code points for
one thing or another.

> or perhaps
> by allocating a new range elsewhere.
See:

https://www.unicode.org/policies/stability_policy.html

The General_Category property value Private_Use (Co) is immutable: the
set of code points with that value will never change.

That guarantee has been in place since 1996, and is a rule that binds
the UTC. So nope, sorry, no more PUA ranges.
> Meow!

Grrr! ;-)

As I see it, the only feasible way for people to get specialized
behavior for PUA ranges involves first ceasing to assume that somehow
they can jawbone the UTC into *standardizing* some ranges for some
particular use or another. That simply isn't going to happen. People who
assume this is somehow easy, and that the UTC are a bunch of boneheads
who stand in the way of obvious solutions, do not -- I contend --
understand the complicated interplay of character properties, stability
guarantees, and implementation behavior baked into system support
libraries for the Unicode Standard.

The way forward for folks who want to do this kind thing is:

1. Define a *protocol* for reliable interchange of custom character
property information about PUA code points.

2. Convince more than one party to actually *use* that protocol to
define sets of interchangeable character property definitions.

3. Convince at least one implementer to support that protocol to create
some relevant interchangeable *behavior* for those PUA characters.

And if the goal for #3 is to get some *system* implementer to support
the protocol in widespread software, then before starting any of #1, #2,
or #3, you had better start instead with:

0. Create a consortium (or other ongoing organization) with a 10-year
time horizon and participation by at least one major software
implementer, to define, publicize, and advocate for support of the
protocol. (And if you expect a major software implementer to
participate, you might need to make sure you have a business case
defined that would warrant such a 10-year effort!)

--Ken
Received on Tue Aug 21 2018 - 13:04:05 CDT

This archive was generated by hypermail 2.2.0 : Tue Aug 21 2018 - 13:04:05 CDT