Re: An attempt to focus the PUA discussion [long]

From: Ernest Cline (ernestcline@mindspring.com)
Date: Sat May 01 2004 - 08:20:46 CST


> [Original Message]
> From: Philippe Verdy <verdy_p@wanadoo.fr>
>
>
> ----- Original Message -----
> From: "Kenneth Whistler" <kenw@sybase.com>
>
>
> > > Providing
> > > private use characters with a default ccc other than 0 would
> > > open combining classes for private use in a manner that
> > > could be consistently normalized regardless of whether
> > > the implementation was a party to the private use or not.
> >
> > Note that these could *not* be any existing PUA code points,
> > for the following reason.
> >
> > <resaon snipped to save space>
> >
> > Clearly this is disallowed by normalization stability guarantees.
> >
> > So if you or anybody else is proposing such a change, make
> > sure that it is in the context of defining a *new*
> > block of private use characters, off the BMP and not
> > Planes 15 or 16.
>
> I would like to oppose to your point of view: an application that does
not know
> what is the private codepoint 10001 will need to (and MUST) handle it with
> combining class 0 to guarantee stability of the encoded text, simply
because it
> does not know if its a symbol, a base letter or a combining character or a
> format control. It will preserve the order in any case.
>
> An application that _knows_ what 10001 means (by knowing which private
> convention is used and intended by its user), may assign its own
properties,
> including changing the combining class from 0 to 230, and thus allowing
> reordering of the sequence above if it matches the private convention.
This
> means that the sequence above _will_ be reordered to 0061 0323 10001, and
the
> 10001 does not block now the composition of 0061 0323 (if such composition
> exists, I did not check what these codepoints mean, but it does not matter
> here).

I don't see where what you say here Philippe, contradicts Ken too much.
I saw him as only pointing out that for the default properties of the
*existing*
Private Use characters, Unicode stability guarantees prevent assigning
a different default canonical combining class to those existing characters.
Private Use versions of Unicode are outside the stability guarantees, but
the defaults that are to be assumed outside of the existence of a Private
Use agreement are not.

However, this does point out the desirability of having codepoints that
while
their semantics are defined privately, their property values are defined by
Unicode. This is because while an implementation conforming to a particular
Private Use should be able to normalize according to that Private Use, doing
so makes the data unfit to be examined by any tools not aware of that
Private
Use. This complicates the task for Private Uses, as they essentially have
to
not only reinvent the wheel, but the lever, pulley, and screw as well. If
it could
be made possible to have standard tools following the standard Unicode rules
while using the standard default Unicode properties, it would make it easier
to accommodate Private Uses.



This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:25 CDT