From: Peter Kirk (email@example.com)
Date: Wed Mar 31 2004 - 12:11:39 EST
On 30/03/2004 16:30, Kenneth Whistler wrote:
>Uh, sorry, Peter, but the implications here are so much b...., err, ...
>The majority of the world's scripts are left-to-right. They also
>happen to be non-Western. There are more *Indic* scripts encoded
>in the Unicode Standard than *Western* scripts.
>The majority of *entities* that the majority of users put into
>PUA characters in actual application usage are unencoded CJK
>ideograph variants and symbols from Asian code pages. It was
>primarily the need to accomodate those *Eastern* users that drove
>the setting of default values for the PUA.
OK, in that case let's allocate properties to PUA characters in
proportion to the number of RTL vs LTR scripts, and the proportion of
combining marks vs. base characters, in actual encoded scripts. The
majority of PUA characters are unchanged. A significant minority become
RTL or non-spacing.
A lot of effort has gone into accommodating certain *Eastern* users.
Something like 100,000 CJK characters have already been defined, and
already that is not enough and they have requisitioned two more planes
of PUA with LTR properties. Fair enough if they might be needed. But
what if users of certain other scripts e.g. RTL scripts want just a
handful of PUA characters with the properties they need? Why is
preference given to CJK? This sounds like bias to me even if I was wrong
to call it western.
>>This bias is also reflected in their
>>system software which (as far as I know with no exceptions) does not
>>allow users to specify properties for PUA characters other than the
>>default decided by the UTC.
>Bias? Or business sense?
>If you want some specialized behavior for software, you either
>write it yourself, or pay someone to write it, or convince someone
>else that adding such a feature to the software *they* write
>will pay for the investment cost in terms of incremental
>You may not like how the software industry works, but thems
>the breaks for any mature industry.
Well, I don't quite see why it is business sense for software companies
to support the huge PUAs for variant CJK characters, outside the 100,000
or so already defined by Unicode. I do understand that it is business
sense not to support user specification of properties, because that
would be hard work for little or no gain.
>Scenario: The UTC listens to you and defines some section of the PUA
>as strong right-to-left by default for use in PUA-defined bidirectional
>scripts. Somebody else is *already* using that section of the PUA
>for something else. Now they have an interoperability problem,
>because the default behavior they were depending on changes over
>in some future version of some software, not under their control,
>and they data gets munged by bidi.
Well, they weren't supposed to rely on these default properties anyway,
they were supposed to use the PUA at their own risk. They are not the
only ones who are messed up by features of software which is not under
their control. But it might be preferable in practice to define an
additional PUA with RTL properties and one with default ignorable
properties, outside all of the existing PUAs. I am not asking for a
large space; very likely 256 characters of each type would be more than
>This is the kind of stuff the UTC refuses to start up by trying
>to provide some subdivision of semantics in the PUA. *That* is
>the principle, by the way, which guides the UTC position on
>the PUA: Use at your own risk, by private agreement.
>>we do want is compatibility between our applications and the system
>>software, and this proposal is the way to do that.
>I don't see how any proposal to create some particular behavior
>in the PUA is a way to accomplish that.
If a new PUA is created with default RTL properties, one can expect that
system software will soon support it at least to the extent of defining
these characters as RTL for bidi algorithm etc purposes. Similarly with
>A default value for a property is not a requirement by the UTC
>*ON AN IMPLEMENTER* that they use that value. They can use whatever
>property values they desire, but if they depart from what system
>platforms provide them (by default) then they are buying themselves
>an implementation task to get characters to do what they want.
Ken, you are a master of understatement. The task they are buying
themselves is a rewrite of the whole system. Companies don't provide the
details needed for others to customise individual modules, and it would
probably be a breach of copyright etc to attempt to do so. Open Source
is different here, of course.
-- Peter Kirk firstname.lastname@example.org (personal) email@example.com (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Wed Mar 31 2004 - 13:01:11 EST