From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Mar 30 2004 - 21:04:31 EST
> At 17:02 -0800 2004-03-30, Mike Ayers wrote:
> > It does not seem reasonable to
> >me that *any* standard behavior could be expected of PUA code
> >points, from operating systems or applications,
and Michael Everson responded:
> Which I assume means: "it's wrong for Unicode to make ANY property
> pronouncements for ANY PUA characters, since that defines them, and
> removes the P from the Use."
The problem is that real software has function (or method)
invocations in it like:
character.getProperty()
And if a user, via whatever indirect stack of software may be
involved, manages to accomplish:
character.setValue(0xE000)
then, an invocation to character.getProperty() has to do something
more reasonable than result in an access violation and freeze the
computer.
Or do you really think that PUA purists would prefer that kind of
behavior in their software? Hmmm?
The bidirectional algorithm depends on a partition property. Every
code point that participates in the algorithm has to have *some*
value of that partition for the algorithm to be well-defined for
all encoded characters -- and that includes PUA characters, which
are encoded characters. The UTC could have chosen bc=ON or bc=BN
or bc=R or something completely stupid like bc=PDF instead of
bc=L as the default property for PUA characters, but "None of the
Above" was not an option. Based on their implementation experience
with use of PUA characters, bc=L made the most sense and was
the choice made by the UTC for the default.
Consider another example. The normalization algorithm has to work
for *all* Unicode code points, assigned or not, because it guarantees
stability into the future when characters are encoded at code points
which were previously unencoded. It also, then, obviously has to
work for PUA characters, as well. That implies that two additional
properties *MUST* have some default values set for PUA characters.
One of those is decomposition, which is defaulted to the null string
(no decomposition) for all PUA characters. The other is canonical
combining class, which is defaulted to ccc=0 for all PUA characters.
Doing anything else would have just been stupid. But again,
"None of the Above" was not an option.
--Ken
This archive was generated by hypermail 2.1.5 : Tue Mar 30 2004 - 22:44:20 EST