RE: What is the principle?

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Mar 30 2004 - 21:04:31 EST

  • Next message: Doug Ewell: "PUA properties, default or otherwise (was: Re: What is the principle?)"

    > At 17:02 -0800 2004-03-30, Mike Ayers wrote:
    > > It does not seem reasonable to
    > >me that *any* standard behavior could be expected of PUA code
    > >points, from operating systems or applications,

    and Michael Everson responded:

    > Which I assume means: "it's wrong for Unicode to make ANY property
    > pronouncements for ANY PUA characters, since that defines them, and
    > removes the P from the Use."

    The problem is that real software has function (or method)
    invocations in it like:

    character.getProperty()

    And if a user, via whatever indirect stack of software may be
    involved, manages to accomplish:

    character.setValue(0xE000)

    then, an invocation to character.getProperty() has to do something
    more reasonable than result in an access violation and freeze the
    computer.

    Or do you really think that PUA purists would prefer that kind of
    behavior in their software? Hmmm?

    The bidirectional algorithm depends on a partition property. Every
    code point that participates in the algorithm has to have *some*
    value of that partition for the algorithm to be well-defined for
    all encoded characters -- and that includes PUA characters, which
    are encoded characters. The UTC could have chosen bc=ON or bc=BN
    or bc=R or something completely stupid like bc=PDF instead of
    bc=L as the default property for PUA characters, but "None of the
    Above" was not an option. Based on their implementation experience
    with use of PUA characters, bc=L made the most sense and was
    the choice made by the UTC for the default.

    Consider another example. The normalization algorithm has to work
    for *all* Unicode code points, assigned or not, because it guarantees
    stability into the future when characters are encoded at code points
    which were previously unencoded. It also, then, obviously has to
    work for PUA characters, as well. That implies that two additional
    properties *MUST* have some default values set for PUA characters.
    One of those is decomposition, which is defaulted to the null string
    (no decomposition) for all PUA characters. The other is canonical
    combining class, which is defaulted to ccc=0 for all PUA characters.
    Doing anything else would have just been stupid. But again,
    "None of the Above" was not an option.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue Mar 30 2004 - 22:44:20 EST