Re: Defined Private Use was: SSP default ignorable characters

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Apr 28 2004 - 02:42:56 EDT

  • Next message: Antoine Leca: "Re: Croatian"

    Ernest Cline <ernestcline at mindspring dot com> wrote:

    > As others have pointed out in the past, because there exist Private
    > Use scripts that expect the current existing Private Use defaults,
    > it would not be a good idea for Unicode to change the defaults of
    > existing Private Use characters.

    Technically, Unicode has only defined the defaults for the code
    positions, not for the "existing Private Use characters" that occupy
    them.

    Once you assign a character to a PUA code point, you have the right (and
    IMHO the responsibility) to assign appropriate properties to it. (As
    Peter Kirk points out, though, that doesn't magically endow software
    with the ability to recognize those properties.)

    > It would be slightly helpful if there
    > was a means to include guidance as to which set of Private Use
    > characters was being used. It would be quite a while before
    > applications would be able to take advantage for such information,
    > but it could serve as the basis of a long term solution.
    >
    > U+E0002 PRIVATE USE REGISTRY TAG
    >
    > might be a possible long-term solution that would enable
    > such information to be stored in-band, altho give the lack
    > of acceptance for in-band tags, the most I would expect
    > from it would be to help define a common standard for tagging
    > the set of private use characters in use that could be adopted
    > by markup rather than the use of the tag characters themselves.

    First, this will never ever happen, because it would turn the UTC into
    operators of a meta-registry -- a registry of PUA registries -- and this
    is the very antithesis of the UTC's approach to the PUA.

    Second, you are correct that the Plane 14 tagging mechanism has met
    with, to say the least, a lack of acceptance. I warned in 2002 that
    deprecation or even "strong discouragement" of the Plane 14 language
    tags would be equivalent to implicit deprecation of the entire Plane 14
    tag concept. As it turns out, a lot of people think that's a good idea.

    > TENGWAR DUODECIMAL DIGITS TEN and ELEVEN
    > present an interesting problem. They are digits, but not
    > decimal digits. Should the concept of General Category
    > Nd be expanded to include non-decimal number systems?

    No, the "d" stands for Decimal. This category is deliberately limited
    to characters that can be concatenated to form numbers in a base-10
    positional number system. It's a fact of life that base-12 and base-16
    digits are relegated to category No.

    > Or would
    > E06A;TENGWAR DIGIT TEN;Nl;0;L;;10;10;10;N;;;;;
    > be sufficient?

    I think the General Category has to be No rather than Nl. Very few
    characters are of type Nl -- just the Roman numerals, "Hangzhou"
    numerals and Ideographic Zero, and Runic and Gothic letter-numbers.
    Tengwar duodecimal digits aren't letters that got pressed into service
    as numbers, they're just digits that happen to be base-12.

    Also, of the three "10" values, you need to remove the first -- it's
    only valid for characters with the decimal digit property (see
    http://www.unicode.org/Public/UNIDATA/UCD.html for more details). The
    other two are OK.

    In summary, your listing should probably be:

    E06A;TENGWAR DUODECIMAL DIGIT TEN;No;0;L;;;10;10;N;;;;;

    Compare with the properties I assembled for my invented script, at
    http://users.adelphia.net/~dewell/ew-props.html; in particular:

    E6CA;EWELLIC HEXADECIMAL DIGIT TEN;No;0;L;;;10;10;N;;;;;

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Wed Apr 28 2004 - 03:19:13 EDT