Re: Defined Private Use was: SSP default ignorable characters

From: Peter Kirk (
Date: Thu Apr 29 2004 - 17:48:16 EDT

  • Next message: Peter Kirk: "Re: An attempt to focus the PUA discussion [long]"

    On 29/04/2004 11:51, Kenneth Whistler wrote:

    > ...
    >Dean Snyder has come to the conclusion that use of the PUA is
    >problematical for the kinds of purposes he envisioned putting it
    >to, because support for custom properties is nonexistent in
    >easily available software, because support for display is spotty,
    >because ensuring that other people share the same conventions
    >as he might want to define is difficult, because he
    >or the other scholars he might want to work with may lack the
    >expertise and/or funds to accomplish the custom programming they
    >might need to do to use a PUA encoding effectively, and because
    >the time involved in getting the PUA to work for his research
    >might better be spent doing something else, including working
    >through an actual standardization proposal for the script(s)
    >he might be interested in.
    >I *agree* with those assessments.

    Thanks for the clarification.

    >Using the PUA for encoding some scholarly text should be a matter
    >of *last* resort, when no other option is really available. And
    >then the person who resorts to that should be prepared to use
    >the PUA with minimal generic support and with plans to export/convert
    >to other formats for particular kinds of processing and/or
    >rendering that they may require. And they should be prepared
    >to get their hands dirty with some programming to accomplish
    >what they need to do.
    >The PUA is basically a "wasteland", as Dean indicated. It is a
    >range of 137,468 code points that are provided for people to
    >do with what they will. Caveat emptor. It is silly to expect
    >that what one person might decide to do with them will not run
    >into problems with what somebody else might decide to do with
    >them -- after all, their interpretation is *deliberatly* not
    >standardized -- that's the nature of "private use".
    To this, all I can say is that with all those caveats no emptor is going
    to buy the PUA, just as no one will rush to buy a piece of uncleared
    wasteland. But there are plenty of buyers (metaphorically of course, I
    am not talking money) for an improved piece of ground, and the UTC can
    make those improvements easily although the potential purchasers can't.
    So it is worth the UTC's while to improve the land which they have
    available to sell. Unfortunately, because there is no real money
    involved, they have little motivation to do so.

    >>except apparently for purely
    >>internal use within one company, which is outside the scope of the standard.
    >I think you may have some serious misapprehensions about what
    >"use within one company" means these days in software
    >development. Software these days is massive, distributed,
    >and modular. The "private agreement" I have on some PUA
    >character's use may be shared publicly with some other group
    >developing some other piece of software. It may involve harmonizing
    >a decision on private use with private use defined by some *other*
    >company's software, without that decision ever rising to the level
    >of end-user visibility. Such issues are *not* outside the
    >scope of the standard. We depend on a common understanding of
    >what PUA code points are and how they might be used, as defined
    >by the standard. The particular intepretation that I might
    >use in a particular piece of software *is* outside the scope
    >of the standard, but that isn't correlated with whether that
    >usage is within one company or not.
    >As it happens, one of the major uses to which PUA code points are
    >actually put in major software today is in cross-mapping East
    >Asian code pages. And in such cases there are implicit "private
    >agreements" between the company that might define such a
    >cross-mapping involving an East Asian code page and another
    >company that may use such a cross-mapping or which may have to
    >provide a conversion table which emulates the same cross-mapping.
    >That is a *very* common situation in software development today.

    Again, thanks for the clarification. I can see that the PUA is useful
    for doing such things with East Asian scripts, not because they are LTR
    but because their writing direction seems to be independent of Unicode
    character properties. It is unfortunate that there is not the same
    usefulness for scripts whose writing direction, combining character
    properties etc are defined by Unicode character properties.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Thu Apr 29 2004 - 18:30:04 EDT