Re: What is the principle?

From: Peter Kirk (
Date: Wed Mar 31 2004 - 12:11:39 EST

  • Next message: Peter Kirk: "Re: Fixed Width Spaces (was: Printing and Displaying DependentVowels)"

    On 30/03/2004 16:30, Kenneth Whistler wrote:

    > ...
    >Uh, sorry, Peter, but the implications here are so much b...., err, ...
    >The majority of the world's scripts are left-to-right. They also
    >happen to be non-Western. There are more *Indic* scripts encoded
    >in the Unicode Standard than *Western* scripts.
    >The majority of *entities* that the majority of users put into
    >PUA characters in actual application usage are unencoded CJK
    >ideograph variants and symbols from Asian code pages. It was
    >primarily the need to accomodate those *Eastern* users that drove
    >the setting of default values for the PUA.
    OK, in that case let's allocate properties to PUA characters in
    proportion to the number of RTL vs LTR scripts, and the proportion of
    combining marks vs. base characters, in actual encoded scripts. The
    majority of PUA characters are unchanged. A significant minority become
    RTL or non-spacing.

    A lot of effort has gone into accommodating certain *Eastern* users.
    Something like 100,000 CJK characters have already been defined, and
    already that is not enough and they have requisitioned two more planes
    of PUA with LTR properties. Fair enough if they might be needed. But
    what if users of certain other scripts e.g. RTL scripts want just a
    handful of PUA characters with the properties they need? Why is
    preference given to CJK? This sounds like bias to me even if I was wrong
    to call it western.

    >>This bias is also reflected in their
    >>system software which (as far as I know with no exceptions) does not
    >>allow users to specify properties for PUA characters other than the
    >>default decided by the UTC.
    >Bias? Or business sense?
    >If you want some specialized behavior for software, you either
    >write it yourself, or pay someone to write it, or convince someone
    >else that adding such a feature to the software *they* write
    >will pay for the investment cost in terms of incremental
    >increased sales.
    >You may not like how the software industry works, but thems
    >the breaks for any mature industry.

    Well, I don't quite see why it is business sense for software companies
    to support the huge PUAs for variant CJK characters, outside the 100,000
    or so already defined by Unicode. I do understand that it is business
    sense not to support user specification of properties, because that
    would be hard work for little or no gain.

    >Scenario: The UTC listens to you and defines some section of the PUA
    >as strong right-to-left by default for use in PUA-defined bidirectional
    >scripts. Somebody else is *already* using that section of the PUA
    >for something else. Now they have an interoperability problem,
    >because the default behavior they were depending on changes over
    >in some future version of some software, not under their control,
    >and they data gets munged by bidi.

    Well, they weren't supposed to rely on these default properties anyway,
    they were supposed to use the PUA at their own risk. They are not the
    only ones who are messed up by features of software which is not under
    their control. But it might be preferable in practice to define an
    additional PUA with RTL properties and one with default ignorable
    properties, outside all of the existing PUAs. I am not asking for a
    large space; very likely 256 characters of each type would be more than

    >This is the kind of stuff the UTC refuses to start up by trying
    >to provide some subdivision of semantics in the PUA. *That* is
    >the principle, by the way, which guides the UTC position on
    >the PUA: Use at your own risk, by private agreement.
    >>we do want is compatibility between our applications and the system
    >>software, and this proposal is the way to do that.
    >I don't see how any proposal to create some particular behavior
    >in the PUA is a way to accomplish that.
    If a new PUA is created with default RTL properties, one can expect that
    system software will soon support it at least to the extent of defining
    these characters as RTL for bidi algorithm etc purposes. Similarly with
    default ignorable.

    > ...
    >A default value for a property is not a requirement by the UTC
    >*ON AN IMPLEMENTER* that they use that value. They can use whatever
    >property values they desire, but if they depart from what system
    >platforms provide them (by default) then they are buying themselves
    >an implementation task to get characters to do what they want.
    Ken, you are a master of understatement. The task they are buying
    themselves is a rewrite of the whole system. Companies don't provide the
    details needed for others to customise individual modules, and it would
    probably be a breach of copyright etc to attempt to do so. Open Source
    is different here, of course.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Wed Mar 31 2004 - 13:01:11 EST