From: Kenneth Whistler (email@example.com)
Date: Wed Mar 31 2004 - 16:30:25 EST
> There are currently some 10 totally unused planes, with not even any
> tentative plans for them, Allocating one or two those into additional
> Private Use Areas with a variety of default characteristics instead of
> the monotonous default characteristics of the existing Private Use
> Areas should not prove too difficult.
Fine. Make your formal proposal to the UTC and to SC2/WG2 and
see whether it is "difficult" or not to convince the committees
of the appropriateness of your approach.
> For example, 26 blocks of 128
> Private Use Combining Marks each, each block corresponding to
> one of the existing canonical combining classes (with perhaps a
> larger block for class 0) would amply satisfy the needs of most
> private use scripts for combining marks. Similarly, blocks for
> additional characters that would have other properties
which would be what, exactly?
> be simple to define and for most combinations of property values,
which would be what, exactly?
As of Unicode 4.0.1, PropertyAliases.txt now lists 82 distinct
character properties. Some of those, particularly those most
relevant to complex script behavior and rendering, such as
General_Category, Bidi_Class, Canonical_Combining_Class, Joining_Type,
etc., are multi-valued. Do you have any idea how big the numbers
start getting when combinatorics start to get involved here?
Or are you planning to do the research first, via a comprehensive
implementation of character properties such as IUC, to first
determine what the actual existing number of combinations of
property values is for the existing repertoire and properties
and then make a principled projection of that into the
uncertain world of characters for scripts which have not yet
been encoded or modeled?
> 128 characters should also prove to be exceedingly ample
> I'd have to take the time to list them, but a quick glance convinces
> me that there are at most several hundred combinations that would
> need to be supported if we limit things to just those combinations
> already in use.
This may be correct, but you'd have to make the case based
on the existing data from property implementations.
> (it might take more, if for example all 256 potential
> combining classes were supported instead of the 26 listed in
> UCD.html), At 128 characters per combination plus more for a
> few that might need them, it should prove possible to handle this
> in 1 or 2 planes.
Which still begs the fundamental questions:
Why this scheme instead of a much more flexible scheme, as
outlined by Rick, for having an implementation with API support
for establishing PUA properties on an as-needed basis? (Which
requires *no* action by the UTC at all, by the way.)
What makes you think, once you have such a scheme of property
combinations worked out, and once you convinced the UTC of
it (which I doubt), that you could also convince SC2/WG2 to
do something comparable in 10646 to keep the standards in synch?
Recall that SC2/WG2 has almost *no* concept of character properties --
those are added by the Unicode Standard. Bring in a proposal
that says, "We need to add two more planes of private use
characters, with these special properties, because XYZ..." and
you'll get a row of blank stares from the national body
Finally, assuming that you could get something like this into
the standards, what makes you think that the platform vendors
would complicate and expand their character property tables
to support this speculative scheme? They have the option to
not support all characters in the standard, and a new plane or
two full of PUA characters with a checkerboard of speculative
property assignments strike me as prime candidates for the
kind of stuff they would simply say, "We have no interest in
supporting these things."
I think you're spitting into the wind if you think you can
force, through the character standardization process, the
major platform vendors to support the kind of PUA functionality
you are after, when they could do so *today* via much more
extensible and architecturally sensible means given the
existing PUA characters, but have not yet chosen to do so.
This archive was generated by hypermail 2.1.5 : Wed Mar 31 2004 - 17:13:59 EST