Re: What is the principle?

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Mar 29 2004 - 18:14:40 EST

  • Next message: Ernest Cline: "Re: What is the principle?"

    Peter Kirk responded:

    > >Third, the proposal to "transfer ... some or all of the Variation
    > >Selectors on the SSP to Private Use" is unclear on the concept of
    > >Private Use. The UTC will make *no* semantic encoding commitment
    > >regarding what a private use character is to be used for. That would
    > >include *not* specifying that some range of Private Use characters
    > >be dedicated to use as variation selectors (privately defined). ...
    > >
    > >
    >
    > The problem here is that, despite what you say, the UTC has already
    > specified the character properties of all of the existing PUA
    > characters,

    No. The UTC has specified default values for the properties
    for all code points, including the PUA, to prevent implementers
    of property API's from having them blow up or return random
    values for such code points.

    > in a way which rules out their use as variation selectors,
    > or as combining marks, or as right-to-left characters.

    They do not. A user of PUA characters is free to define the
    whole range of PUA characters as consisting of strong R-to-L
    characters and implementing accordingly. I have, for example, for
    my own internal use for developing collation tables, defined
    U+F8F0..U+F8F4 as being non-spacing combining marks (with no
    display), for use in providing variant weights. These are, in
    fact, very, very similar to what you are advocating for here as
    specialized variant selectors. But I do so by my own *PRIVATE*
    use of those characters, with code that assigns my own *PRIVATE*
    semantics and operates accordingly. I don't expect, in this
    case, to even require interoperability with anyone else, because
    the usage is internal and what matters are the weighted outputs
    in the tables. But in principle I could interchange this usage
    with someone else who chose to also treat U+F8F0..U+F8F4 as
    non-spacing combining marks with no display, indicating variant
    forms.

    The problem you are having is that you (and most implementers) are
    dependent on how the underlying *platform* treats the PUA, and
    have not been given an API which makes it easy to specific
    specific character properties along with your PUA character
    assignments.

    >
    > As an alternative to adjusting the definitions of the existing variation
    > selectors, might it be possible for the UTC to adjust the character
    > properties of parts of the Supplementary Private Use Areas?

    If you are asking me, I'd say the answer is no.

    > For example,
    > a range of characters could be defined as default ignorable, default
    > collation weight [.0000.0000.0000.0000] etc., and so these could be used
    > as private variation selectors, or as private diacritical marks (which
    > would simply disappear if viewed with a regular font; they would be in
    > combining class 0 and so there would be no normalisation issues); and
    > another range could be defined as RTL; and whatever other ranges might
    > be required.

    You can do it privately. See above. But attempting to do such things
    in terms of formally specified usages of the PUA is an invitation
    to failure of interoperability.

    > Alternatively, an additional PUA could be defined to avoid
    > changing the properties of existing characters.

    This also won't happen. In my assessment the UTC is just dead set against
    trying to create this kind of mechanism through proliferating types
    of PUA spaces.

    > This cannot be in
    > conflict with the principle that "The UTC will make *no* semantic
    > encoding commitment regarding what a private use character is to be used
    > for" because these kinds of properties have already been specified by
    > the UTC for the existing PUA.

    Nope. You're wrong. A default value for a property is not a
    requirement by the UTC regarding what a PUA character can or may
    or must be used for.

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Mar 29 2004 - 19:08:39 EST