Re: SSP default ignorable characters, was: Variation selectors and vowel marks

From: Peter Kirk (
Date: Mon Apr 26 2004 - 19:35:11 EDT

  • Next message: John Hudson: "Re: Proposal to add 2 Romanian characters"

    On 26/04/2004 08:35, Doug Ewell wrote:

    >Peter Kirk <peterkirk at qaya dot org> wrote:
    >>>... And if you say, well, this won't work because Microsoft Word and
    >>>Internet Explorer and other tools and vendors don't let me override
    >>>the default PUA properties, I reply: do you really think they will be
    >>>any quicker to support this new PUA block?
    >>Yes, because the whole point of the definition of this block of
    >>characters as default ignorable is that implementations are ALREADY
    >>supposed to ignore these code points in processing and display, even
    >>before they are defined as characters. I would expect the latest
    >>versions of Unicode compatible tools to treat these code points as if
    >>they were already defined default ignorable characters.
    >You could try going ahead and writing a proposal to carve out part of
    >the existing DI block as another private-use area. I suppose I know
    >what the response will be:...

    Yes, and I can tell you my responses to those responses.

    >... (1) we already gave you 137,000 private-use
    >code points, what do you need more for? ...

    All 137,000 are defined with a set of properties which are not those
    which I need. These properties are defined in the Unicode data files.
    Although in principle software may allow these properties to be
    redefined, in practice no member of the Unicode consortium members has
    written software which supports such redefinition, which makes the
    redefinition a dead letter.

    >... (2) if you say you need a new DI
    >PUA, next somebody will want one for RTL, one for combining marks, one
    >for font control, etc. etc., ...

    So? There is plenty of space available. First they would need to
    demonstrate a need. There is probably not a need for RTL as RTL override
    can be used.

    >... and (3) we don't want to be in the business
    >of assigning properties to PUA characters anyway; the *default*
    >properties we assigned are intended to be overridable by private

    But this is a dead letter as we have seen because there is no way that
    users can make private agreements with major systems software providers.

    >The fact that Uniscribe and other rendering engines apply the *default*
    >properties to all PUA code points, and provide no mechanism to modify
    >them, is a fault in the rendering engines, although probably not a
    >high-priority one in vendors' eyes.
    >The Principles and Procedures document says that getting around
    >short-term deficiencies in rendering technology is explicitly *not* a
    >reason to create new characters, so I doubt it will be seen as a reason
    >to create new private-use areas either.

    If the Unicode consortium members agree that the lack of support for
    redefinition of PUA properties is a short-term deficiency in their
    software products and commit themselves to remedying this deficiency in
    the medium term, fine. If the deficiency is actually a long-term one
    which the members have no intention of remdying, these Principles and
    Procedures do not apply.

    >>OK, if I were a hacker I might be able to hack open source software,
    >>but if I were a hacker I would find easier ways of hacking my
    >>requirements into Unicode.
    >I'm sure the open-source people would rather you spoke of "programming"
    >or "developing" rather than "hacking." I do both regularly (no
    >cracking, though) and trust me, they are not the same.
    If they saw how I program, they would agree that "hacking" is the right
    word! :-) My point is more that the kinds of changes I might make would
    come under what you would consider hacking, e.g. fixing code which
    defines a certain range of characters as default ignorable to cover part
    of the existing PUA. (The easier hack is probably to use part of the
    existing default ignorable range, perhaps the deprecated tags, as if it
    were PUA although it isn't.) A proper programming or development effort
    would depend on the a properly designed mechanism for defining character

    Thinking about it, perhaps there would be more mileage in allowing the
    existing tags area, whose use is deprecated, E0000–E007F so within the
    default ignorable range, as a kind of private use area.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Mon Apr 26 2004 - 20:12:15 EDT