Re: SSP default ignorable characters, was: Variation selectors and vowel marks

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Apr 26 2004 - 21:41:53 EDT

  • Next message: Cristian Secarã: "Re: Proposal to add 2 Romanian characters"

    > >You could try going ahead and writing a proposal to carve out part of
    > >the existing DI block as another private-use area. I suppose I know
    > >what the response will be:...
    > >
    >
    > Yes, and I can tell you my responses to those responses.

    This is getting a little like those interactions:

    A: Joke #162!
    B: *ha! ha! ha!* Yeah, that's a funny one!

    >
    > >... (1) we already gave you 137,000 private-use
    > >code points, what do you need more for? ...
    > >
    >
    > All 137,000 are defined with a set of properties which are not those
    > which I need. These properties are defined in the Unicode data files.

    The *default* properties are given by the Unicode Character
    Database, but a user of the standard can define PUA characters
    to be whatever they want.

    > Although in principle software may allow these properties to be
    > redefined, in practice no member of the Unicode consortium members has
    > written software which supports such redefinition,

    This is incorrect. As I have stated before, *I* have written
    such software, and I work for a company which is a member
    of the Unicode Consortium. What I use that redefinition for
    is *internal* to that software, however, and I won't claim
    that I write some end user GUI application that would make
    *you* happy about your intended usage of PUA characters and
    their properties.

    > which makes the
    > redefinition a dead letter.

    No, it means that people who want arbitrary redefinitions of
    such character properties have to be able to do the work
    themselves or hire someone to do it for them.

    > >... (2) if you say you need a new DI
    > >PUA, next somebody will want one for RTL, one for combining marks, one
    > >for font control, etc. etc., ...
    > >
    >
    > So? There is plenty of space available.

    Space is not the issue, of course. The issue is that the UTC
    is not about to start subdividing the PUA into little zones
    with different character properties.

    We have, by the way, plowed this field more than once on
    this list.

    > >... and (3) we don't want to be in the business
    > >of assigning properties to PUA characters anyway; the *default*
    > >properties we assigned are intended to be overridable by private
    > >agreement.

    >
    > But this is a dead letter as we have seen because there is no way that
    > users can make private agreements with major systems software providers.

    Correct. But I think you may have overblown expectations about
    what one should be able to do with PUA code points and what
    major systems software providers should be able or be required
    to support for them.

    > Thinking about it, perhaps there would be more mileage in allowing the
    > existing tags area, whose use is deprecated,

    Discouraged, not deprecated.

    > E0000–E007F so within the
    > default ignorable range, as a kind of private use area.

    That would be nonconformant use of the standard, and you could
    predict the results you will get if you go there.

    Peter, what "default ignorable" means is that if a rendering
    process does not intepret the character, it should be,
    by default, displayed as nothing, rather than with the
    more normal square box (or similar nondisplayable character
    glyph).

    If you want to display characters implemented in the PUA
    at all, then you need a custom font, mapping glyphs from
    the PUA code points you have defined. Anybody else who wants
    to see your data will need that font. If you want to
    *simulate* default ignorable code points in your PUA
    usage, then map them to a zero glyph (no image or display
    width at all) in your font. As for special behavior for
    your default ignorable PUA code points when interpreted
    in combination with neighboring characters, well, you're
    on your own in defining such behavior, anyway.

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Apr 26 2004 - 22:13:48 EDT