Re: What is the principle?

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Mar 31 2004 - 16:30:25 EST

  • Next message: Philippe Verdy: "Re: What is the principle?"

    Ernest suggested:

    > There are currently some 10 totally unused planes, with not even any
    > tentative plans for them, Allocating one or two those into additional
    > Private Use Areas with a variety of default characteristics instead of
    > the monotonous default characteristics of the existing Private Use
    > Areas should not prove too difficult.

    Fine. Make your formal proposal to the UTC and to SC2/WG2 and
    see whether it is "difficult" or not to convince the committees
    of the appropriateness of your approach.

    > For example, 26 blocks of 128
    > Private Use Combining Marks each, each block corresponding to
    > one of the existing canonical combining classes (with perhaps a
    > larger block for class 0) would amply satisfy the needs of most
    > private use scripts for combining marks. Similarly, blocks for
    > additional characters that would have other properties
                                            ^^^^^^^^^^^^^^^^
                                            
    which would be what, exactly?

    > should
    > be simple to define and for most combinations of property values,
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                  
    which would be what, exactly?

    As of Unicode 4.0.1, PropertyAliases.txt now lists 82 distinct
    character properties. Some of those, particularly those most
    relevant to complex script behavior and rendering, such as
    General_Category, Bidi_Class, Canonical_Combining_Class, Joining_Type,
    etc., are multi-valued. Do you have any idea how big the numbers
    start getting when combinatorics start to get involved here?

    Or are you planning to do the research first, via a comprehensive
    implementation of character properties such as IUC, to first
    determine what the actual existing number of combinations of
    property values is for the existing repertoire and properties
    and then make a principled projection of that into the
    uncertain world of characters for scripts which have not yet
    been encoded or modeled?
                                  
    > 128 characters should also prove to be exceedingly ample

    For what?

    > I'd have to take the time to list them, but a quick glance convinces
    > me that there are at most several hundred combinations that would
    > need to be supported if we limit things to just those combinations
    > already in use.

    This may be correct, but you'd have to make the case based
    on the existing data from property implementations.

    > (it might take more, if for example all 256 potential
    > combining classes were supported instead of the 26 listed in
    > UCD.html), At 128 characters per combination plus more for a
    > few that might need them, it should prove possible to handle this
    > in 1 or 2 planes.

    Which still begs the fundamental questions:

    Why this scheme instead of a much more flexible scheme, as
    outlined by Rick, for having an implementation with API support
    for establishing PUA properties on an as-needed basis? (Which
    requires *no* action by the UTC at all, by the way.)

    What makes you think, once you have such a scheme of property
    combinations worked out, and once you convinced the UTC of
    it (which I doubt), that you could also convince SC2/WG2 to
    do something comparable in 10646 to keep the standards in synch?
    Recall that SC2/WG2 has almost *no* concept of character properties --
    those are added by the Unicode Standard. Bring in a proposal
    that says, "We need to add two more planes of private use
    characters, with these special properties, because XYZ..." and
    you'll get a row of blank stares from the national body
    representatives.

    Finally, assuming that you could get something like this into
    the standards, what makes you think that the platform vendors
    would complicate and expand their character property tables
    to support this speculative scheme? They have the option to
    not support all characters in the standard, and a new plane or
    two full of PUA characters with a checkerboard of speculative
    property assignments strike me as prime candidates for the
    kind of stuff they would simply say, "We have no interest in
    supporting these things."

    I think you're spitting into the wind if you think you can
    force, through the character standardization process, the
    major platform vendors to support the kind of PUA functionality
    you are after, when they could do so *today* via much more
    extensible and architecturally sensible means given the
    existing PUA characters, but have not yet chosen to do so.

    --Ken



    This archive was generated by hypermail 2.1.5 : Wed Mar 31 2004 - 17:13:59 EST