RE: An attempt to focus the PUA discussion [long]

From: Language Analysis Systems, Inc. Unicode list reader (
Date: Thu Apr 29 2004 - 17:17:42 EDT

  • Next message: Peter Kirk: "Re: New contribution"

    >> 1) Change the default properties for some range of the PUA. This is

    >> what people seem to be pushing most hard.
    >Actually, no. While it seems like an obvious first solution, the
    problems you pointed out are quickly
    >pointed out to such people so that they start to push for option 2)

    I'm not hearing unanimity on this point, but if it's there, that's

    >> 2) Leave the current PUA alone, but set aside a new PUA, say Planes
    >> and 13. This solves the existing-use problem, but you still have the

    >> question of just how you subdivide the range, and it starts to cut
    >> down significantly on the code points available for actual
    >> standardization.
    >It won't require so many code points. I have been working on such a
    proposal, and it requires not even a
    >fifth of a plane, let alone two full planes. I even hope to make it

    I'm intrigued.

    >> 3) Define ad-hoc standards that are based on Unicode but make
    >> character assignments in the PUA and lobby application vendors to
    >> support these encodings in addition to regular Unicode.
    >I just can't see application or OS vendors choosing to pick a single
    PUA standard that is different from
    >the Unicode defaults.

    And I'm not saying they should. I'm not envisioning "a single PUA
    standard"; I'm envisioning a separate PUA standard for each defined user
    community; trying to unify all of the user communities' demands into a
    single PUA standard is tantamount to just putting the characters into

    Of course, getting app vendors to support a bunch of PUA standards is an
    even tougher sell, especially considering it's not a good idea in the
    first place.

    >> 4) Lobby for operating-system vendors to extend their text engines
    >> allow properties of PUA code points to be configured.
    >While this would be a possible solution, it has some real drawbacks as
    well. especially when different
    >Private Use assignments overlap in the codepoints they use and have
    character properties that differ for
    >the same character.

    There'd have a be a way to account for this, in the same way you account
    for different glyphs for the same character by using different fonts.
    There's have to be some way of applying a "property set" to a particular
    run of characters. The most logical way would be to have this be
    somehow associated with a choice of font.

    >> 5) Write specialized applications that are designed to deal with
    >> certain scripts and address the needs of user communities whose needs

    >> aren't being met right now.
    >The problem is, for most potential private uses, if there is sufficient
    interest that a specialized
    >application gets written just to handle that one private use, it
    probably has enough interest to merit
    >being encoded in Unicode itself.

    Right. And if we're just talking about stopgap measures until something
    gets into Unicode, it seems like we can tolerate greater ugliness.

    As for user communities that'll never be served by Unicode, yeah, I
    think there should be a separate standard of some kind
    ("SourGrapes-icode") that can lobby for support from OS and app vendors
    on its own merits. Maybe I'm nuts, but I like to think the UTC is
    generally reasonable and if there's a good reason for something, it
    usually gets in eventually.

    >> 6) Use markup or other fancy-text mechanisms to override the default
    >> properties....
    >Markup already uses the selection of a font to establish which set of
    Private Use characters is in use.

    No, font selection just determines which glyphs to draw.

    >Bidi Class and Line Break can be handled, if not elegantly via the
    existing codes.

    Right. I think there are also markup codes that do these things.

    >However, the behaviors that cause the most interest in a better defined
    private use area, combing marks and
    >cased letters cannot be handled in a generic manner by formatting
    characters. It simply is impossible to
    >simulate non-zero canonical combining class characters in Unicode with
    anything other than a character with >the appropriate canonical
    combining class.

    I'm still clueless as to why this is a good idea. Combining class is
    there for one reason only-- normalization-- and imposing semantics on
    PUA characters can't change normalization. See my other note.

    >Indicating the case would involve having a means to indicate which case
    the character is and where its
    >other case is

    Asking OS and app vendors for a way to control casing behavior seems an
    easier sell than getting them to make everything pluggable. It also
    seems like something that's pretty easy to code on its own, although
    then you have to use an external application to do case mapping rather
    than getting it for free in MS Word (or whatever).

    How big a demand is there for custom case mapping? Seems like most of
    the PUA things I've heard about aren't cased in the first place.

    >> 7) Design custom fonts that cannibalize existing code points that
    >> have the right sets of properties.
    >This is a commonly chosen option right now, only it is usually done
    with respect to legacy encodings so as
    >to make use of the keyboard mapping that is commonly associated with
    that legacy encoding. It's not pretty,
    >but most private users look for something that will make their private
    use easy to accomplish, despite the
    >problems it causes.
    >Any solution which requires users of the Private Use scheme to do more
    than install a few files on their
    >system to get it to work will probably not be used, since schemes such
    as that embodied by option 7) work
    >now, despite the trouble they cause for others not using the scheme.

    And I don't think there's anything wrong with that.

    --Rich Gillam
      Language Analysis Systems, Inc.

    This archive was generated by hypermail 2.1.5 : Thu Apr 29 2004 - 18:02:07 EDT