Re: What is the principle?

From: Peter Kirk (
Date: Wed Mar 31 2004 - 16:42:32 EST

  • Next message: Rick McGowan: "Re: Doing Markup in Plain Text: A Modest Proposal for Planes 4-B of Unicode"

    On 31/03/2004 12:28, Ernest Cline wrote:

    >> ...
    >>This is the kind of stuff the UTC refuses to start up by trying
    >>to provide some subdivision of semantics in the PUA. *That* is
    >>the principle, by the way, which guides the UTC position on
    >>the PUA: Use at your own risk, by private agreement.
    >Which is why if any private use characters with default characteristics
    >other than those of the existing Private Use blocks are ever to be part of
    >Unicode they will need to be added as additional Private Use blocks,
    >not by redefining existing PUA's
    >There are currently some 10 totally unused planes, with not even any
    >tentative plans for them, Allocating one or two those into additional
    >Private Use Areas with a variety of default characteristics instead of
    >the monotonous default characteristics of the existing Private Use
    >Areas should not prove too difficult. For example, 26 blocks of 128
    >Private Use Combining Marks each, each block corresponding to
    >one of the existing canonical combining classes (with perhaps a
    >larger block for class 0) would amply satisfy the needs of most
    >private use scripts for combining marks. Similarly, blocks for
    >additional characters that would have other properties should
    >be simple to define and for most combinations of property values,
    >128 characters should also prove to be exceedingly ample
    >I'd have to take the time to list them, but a quick glance convinces
    >me that there are at most several hundred combinations that would
    >need to be supported if we limit things to just those combinations
    >already in use. (it might take more, if for example all 256 potential
    >combining classes were supported instead of the 26 listed in
    >UCD.html), At 128 characters per combination plus more for a
    >few that might need them, it should prove possible to handle this
    >in 1 or 2 planes.
    Ernest, I support your general ideas here. But I am concerned about the
    implications of defining PUA characters with combining classes other
    than zero. I can see this causing some confusion with normalisation etc.
    And it does hugely multiply the number of PUA characters required.

    Let's think when one might need PUA characters with cc>0. The relevant
    cases are all like <B, M1, M2>, where B is a base character and M1 and
    M2 are combining characters, one or both of them in your proposed
    extended PUA. And cc>0 is required only if you want this sequence to be
    canonically equivalent to <B, M2, M1>, and so want one of these to be
    converted to the other during normalisation - a reordering which can
    only happen if M1 and M2 both have cc>0 (and different).

    Is it really necessary to support to this level of detail the concept of
    canonical equivalence of PUA sequences? Would it not be enough for those
    specifying the PUA characters to specify one of the orderings as correct
    and the other as a spelling error? I really can't see this requirement
    being widespread enough to justify defining the thousands of PUA
    characters with different combining classes which you propose.

    My proposal would rather be for a single group of PUA combining marks
    which all have cc=0, and are all "default ignorable", with the result
    that they are not displayed when a regular font is selected. These could
    be used for non-standardised diacritics, mark-up (I mean this in the
    old-fashioned sense of marks added to the text rather than as a way of
    specifying formatting etc) etc, and also in effect as variation
    selectors if the private font specifies pseudo-digraphs. I don't know
    exactly how many might be required, but I am thinking tens or hundreds
    rather than thousands.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Wed Mar 31 2004 - 17:38:13 EST