Re: What is the principle?

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Mar 30 2004 - 19:30:10 EST

  • Next message: Philippe Verdy: "Re: Windows and Mac character encoding questions"

    Peter Kirk continued:

    > >A user of PUA characters is free to define the
    > >whole range of PUA characters as consisting of strong R-to-L
    > >characters and implementing accordingly. ...
    > >
    >
    > This is not true!

    It is true!!

    > Users can define only those properties which the
    > software that they are using allows them to define. Your argument here
    > completely ignores the distinction between users and software
    > developers.

    No it doesn't. I am well aware of the distinctions between end
    users, application developers, OS platform developers, and
    basic library implementers. I have, at one point or another,
    been in all of those shoes.

    The mistake you (and some others on this thread) are making
    is assuming that PUA characters were added to the standard
    with some kind of implicit guarantee that end users could
    define whatever they wanted there and that operating systems
    would somehow magically supply appropriate rendering and
    other behavior for them.

    People *can* define whatever they want in PUA characters, but
    if they expect something other than very dumb rendering and
    collation behavior to be provided by some other system, they
    are fooling themselves and each other. To do that kind of thing,
    you need to *also* do the work to *implement* that behavior
    for your definition.

    > You may have the luxury of being able to do both. But the
    > vast majority of users depend on the software systems and applications
    > provided by large corporate software companies. (Software written by
    > smaller companies generally uses rendering engines, character processing
    > etc provided by the large companies.)

    Of course. And nobody expects some individual or even some small
    company to be able to duplicate the entire Windows OS just in
    order to implement Tifinagh (or whatever) in PUA characters and
    have a Tifinagh-smart version of a word processing / typesetting
    system come rolling out of the garage for fine publications.

    That's the *REASON*, by the way, that the Unicode new scripts
    committee (and WG2) has the extensive roadmap of additional
    scripts to be encoded. We assume that the best way to get standard
    behavior out of standard software for obscure scripts is to
    *standardize* the character encoding for those scripts and keep
    pushing the big software companies to update their support for
    the latest additions to the standard. This works *much* better
    than futzing around with attempting to get custom behavior for
    complex scripts out of PUA characters.

    > These large companies are mostly
    > members of the Unicode consoritum. They are also overwhelmingly western,
    > mostly American, and so inherently biased in favour of LTR scripts
    > without combining marks. This bias is reflected in the "default"
    > properties assigned to PUA characters, by their majority vote, and their
    > refusal to contemplate changes.

    Uh, sorry, Peter, but the implications here are so much b...., err, ...
    baloney.

    The majority of the world's scripts are left-to-right. They also
    happen to be non-Western. There are more *Indic* scripts encoded
    in the Unicode Standard than *Western* scripts.

    The majority of *entities* that the majority of users put into
    PUA characters in actual application usage are unencoded CJK
    ideograph variants and symbols from Asian code pages. It was
    primarily the need to accomodate those *Eastern* users that drove
    the setting of default values for the PUA.

    > This bias is also reflected in their
    > system software which (as far as I know with no exceptions) does not
    > allow users to specify properties for PUA characters other than the
    > default decided by the UTC.

    Bias? Or business sense?

    If you want some specialized behavior for software, you either
    write it yourself, or pay someone to write it, or convince someone
    else that adding such a feature to the software *they* write
    will pay for the investment cost in terms of incremental
    increased sales.

    You may not like how the software industry works, but thems
    the breaks for any mature industry.

    You may also want to drive a 3-wheeled car that runs on solar
    power. But if you want one, you'll probably have to build it
    yourself, because it is unlikely that you'll get GM or Ford
    or Toyota or Honda or Nissan or Daimler-Chrysler to do it
    for you.

    > At least you understand the problem which totally undermines your
    > argument here.

    *scratches head*

    > >You can do it privately. See above. But attempting to do such things
    > >in terms of formally specified usages of the PUA is an invitation
    > >to failure of interoperability.

    > I don't understand this last comment.

    Scenario: The UTC listens to you and defines some section of the PUA
    as strong right-to-left by default for use in PUA-defined bidirectional
    scripts. Somebody else is *already* using that section of the PUA
    for something else. Now they have an interoperability problem,
    because the default behavior they were depending on changes over
    in some future version of some software, not under their control,
    and they data gets munged by bidi.

    This is the kind of stuff the UTC refuses to start up by trying
    to provide some subdivision of semantics in the PUA. *That* is
    the principle, by the way, which guides the UTC position on
    the PUA: Use at your own risk, by private agreement.

    > What
    > we do want is compatibility between our applications and the system
    > software, and this proposal is the way to do that.

    I don't see how any proposal to create some particular behavior
    in the PUA is a way to accomplish that.

    > >Nope. You're wrong. A default value for a property is not a
    > >requirement by the UTC regarding what a PUA character can or may
    > >or must be used for.
    > >
    > Yes. If a default value is not a requirement, then a CHANGE to a default
    > value is not a requirement. You have no good reason not to make a change
    > to the default value for some PUA characters.

    Huh? The UTC has every reason not to make any change in the default
    values for any PUA characters. (See above.)

    A default value for a property is not a requirement by the UTC
    *ON AN IMPLEMENTER* that they use that value. They can use whatever
    property values they desire, but if they depart from what system
    platforms provide them (by default) then they are buying themselves
    an implementation task to get characters to do what they want.

    > I see the point about not proliferating separate PUA spaces. But that is
    > the only argument I see on your side. Perhaps the UTC will be less dead
    > set against this if the arguments are realised, and perhaps if the few
    > non-western UTC members realise how the process is biased against the
    > languages of their countries.

    This is more utter baloney, I'm afraid. The UTC has done more to
    bring non-western writing systems under the big tent of modern
    software development and global IT infrastructure than any 6
    other standardization organizations you could name, combined.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue Mar 30 2004 - 20:21:48 EST