Re: SSP default ignorable characters, was: Variation selectors and vowel marks

From: Doug Ewell (
Date: Mon Apr 26 2004 - 11:35:16 EDT

  • Next message: Ernest Cline: "RE: Proposal to add 2 Romanian characters"

    Peter Kirk <peterkirk at qaya dot org> wrote:

    >> ... And if you say, well, this won't work because Microsoft Word and
    >> Internet Explorer and other tools and vendors don't let me override
    >> the default PUA properties, I reply: do you really think they will be
    >> any quicker to support this new PUA block?
    > Yes, because the whole point of the definition of this block of
    > characters as default ignorable is that implementations are ALREADY
    > supposed to ignore these code points in processing and display, even
    > before they are defined as characters. I would expect the latest
    > versions of Unicode compatible tools to treat these code points as if
    > they were already defined default ignorable characters.

    You could try going ahead and writing a proposal to carve out part of
    the existing DI block as another private-use area. I suppose I know
    what the response will be: (1) we already gave you 137,000 private-use
    code points, what do you need more for? (2) if you say you need a new DI
    PUA, next somebody will want one for RTL, one for combining marks, one
    for font control, etc. etc., and (3) we don't want to be in the business
    of assigning properties to PUA characters anyway; the *default*
    properties we assigned are intended to be overridable by private

    The fact that Uniscribe and other rendering engines apply the *default*
    properties to all PUA code points, and provide no mechanism to modify
    them, is a fault in the rendering engines, although probably not a
    high-priority one in vendors' eyes.

    The Principles and Procedures document says that getting around
    short-term deficiencies in rendering technology is explicitly *not* a
    reason to create new characters, so I doubt it will be seen as a reason
    to create new private-use areas either.

    By far the most popular use of the PUA thus far has been as an ad-hoc
    glyph registry for technologies (or people) that regard code points and
    glyphs as 1-to-1. Very few people have tried to use the PUA for some
    purpose that the default properties don't handle. That doesn't mean a
    more flexible solution shouldn't be developed, but it does explain why
    the big vendors haven't bothered developing one.

    > On the other
    > hand, if I define my own PUA characters as default ignorable, I can
    > expect my private definitions NEVER to be supported by standard
    > software, because I can't make private agreements with Microsoft or
    > other significant software providers, although it is of course not
    > impossible that someone somewhere some time just might write software
    > which allows users to specify their own properties for PUA characters.

    This is really the mechanism that is needed. Maybe I'll try outlining a
    possible solution. If a small plug-in solution can be proven to the big
    vendors to be low-cost and effective, who knows what they might say?

    > OK, if I were a hacker I might be able to hack open source software,
    > but if I were a hacker I would find easier ways of hacking my
    > requirements into Unicode.

    I'm sure the open-source people would rather you spoke of "programming"
    or "developing" rather than "hacking." I do both regularly (no
    cracking, though) and trust me, they are not the same.

    > Doug, just be happy that your own private script is LTR with no
    > combining characters, and so can be supported in the PUA. It seems
    > that, in practice if not in principle, the PUA is restricted to such
    > scripts.

    Actually this has nothing to do with my script, which IE doesn't display
    properly anyway (it breaks lines arbitrarily between any two

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Mon Apr 26 2004 - 12:10:36 EDT