Re: What is the principle?

From: Mark Davis (
Date: Wed Mar 31 2004 - 20:01:43 EST

  • Next message: fantasai: "Re: Fixed Width Spaces (was: Printing and Displaying DependentVowels)"

    comments below.

    ► शिष्यादिच्छेत्पराजयम् ◄

    ----- Original Message -----
    From: "Peter Kirk" <>
    To: "Mark Davis" <>
    Cc: <>
    Sent: Wed, 2004 Mar 31 19:15
    Subject: Re: What is the principle?

    > On 31/03/2004 14:27, Mark Davis wrote:
    > >While I disagree with most of what you've said on this list, it is not an
    > >unreasonable proposal to change the default properties for some ranges of the
    > >private use blocks. I don't think that this would, in practice, really
    > >any applications, because of #1 below.
    > >
    > >I have, however, a few observations.
    > >
    > >1. PUA properties, as is clear from Ken's excellent descriptions, are simply
    > >defaults. With the exception of normalization, no Unicode implementation is
    > >required to observe them. So even if this change is made, any conformant
    > >implementation is free to simply ignore it and just assign its own
    > >This would not be a magic wand.
    > >
    > >
    > Understood. But I was rather thinking that at least some implementations
    > base their character properties directly on the Unicode character
    > database. Isn't this what ICU does? And so, if the PUA default
    > properties are the ones in the UCD, they would automatically be used by
    > implementations.

    Yes, some do (and ICU does pick up the default). Just pointing out that
    implementations can freely choose the properties (except normalization).

    BTW, you have been mentioning the combining class; you can have combining marks
    in the PUA, but they have to have zero combining classes.

    > >2. Unicode properties are not sufficient for rendering. With technologies
    > >as Apples, all of the other work can be done in a font. With OpenType, most
    > >not all can -- in particular, reordering has to be done by the
    > >So complex scripts that require reordering still would not be interchangeable
    > >without private agreement.
    > >
    > >
    > This is why the suggestions made for storing character properties in the
    > font are unrealistic; they require major restructuring of system
    > software (close to rewriting the whole OS, as I wrote earlier), not just
    > tinkering. I accept that there may be some practical limitations on PUA
    > complex scripts, but I would like them to be a lot less than they are now.

    ANY dynamic reassignment of properties requires a major overhaul. There have
    been proposals over the years for exchange of PU property data. All of them have
    died, and I never expect to see any succeed.

    The reason is that most implementations just get properties with static calls,
    e.g. isLetter(x). To change it to be dynamic, all of these calls in all programs
    would have to be changed to reference a dynamic collection of properties. In a
    single-threaded world, this wouldn't be too bad. But that is not our world --
    which is a multi-threaded world -- there it is nasty; and horrible if the same
    document is expected to contain different sets of PU properties. There are also
    performance implications, since properties are used so heavily in processing.

    These are not whims of software vendors; they would be very expensive retrofits
    for essentially no benefit.

    > >3. Even excluding the normalization properties and other obvious inapplicable
    > >properties (such as name or age), there are some 50-odd possible character
    > >properties, many of them with multiple possible values: see
    > >
    > >
    > >
    > >
    > >
    > >A concrete proposal would have to specify exactly which properties were
    > >relevant, and what the values are for the proposed ranges. (Clearly an even
    > >partition according to all the possible combinations would be completely
    > >impractical.) If the goal is rendering, this means looking at the possible
    > >combinations of properties that are relevant for rendering and proposing a
    > >division that makes sense.
    > >
    > >
    > That is why I (rather than Ernest) have discussed only rendering related
    > properties like bidi and default ignorable. I realise that there may be
    > other properties which need to be considered, but I am not yet sure
    > which these are.

    Those alone won't work. If you want stuff to render right, then you have to
    include *any* property that systems may use to affect display. You do want these
    characters to linebreak correctly, eh? That's why I said that a complete
    proposal would have to spell out all the properties would be considered, and
    give reasons for the inclusion/exclusions.

    > I sense that you prefer to change the default properties of existing PUA
    > characters rather than add new ones. Might it be sensible to adjust the
    > properties in one of the PUA planes but leave the other one untouched?
    > Has ANYONE actually defined characters in one or other of these planes,
    > and if so, which? It would make more sense to change the default
    > properties of a plane which no one is actually using.

    1. There is no way I would advocate adding even more PU characters; the number
    we have is wasteful as it is. (In hindsight, we shouldn't have gone beyond
    U+FFFFF in any event.)

    2. If you are going to make this proposal, I'd suggest using a small part of one
    plane, probably at the high end.

    > >Mark
    > >__________________________________
    > >
    > >► शिष्यादिच्छेत्पराजयम् ◄
    > >
    > >
    > >
    > --
    > Peter Kirk
    > (personal)
    > (work)

    This archive was generated by hypermail 2.1.5 : Wed Mar 31 2004 - 20:41:51 EST