Re: Defined Private Use was: SSP default ignorable characters

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Apr 28 2004 - 02:42:56 EDT

Next message: Antoine Leca: "Re: Croatian"

Previous message: Peter Constable: "RE: Romanian and Cyrillic"
In reply to: Ernest Cline: "Re: Defined Private Use was: SSP default ignorable characters"
Next in thread: John Cowan: "Re: Defined Private Use was: SSP default ignorable characters"
Reply: John Cowan: "Re: Defined Private Use was: SSP default ignorable characters"
Reply: C J Fynn: "Re: Defined Private Use was: SSP default ignorable characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Ernest Cline <ernestcline at mindspring dot com> wrote:

> As others have pointed out in the past, because there exist Private
> Use scripts that expect the current existing Private Use defaults,
> it would not be a good idea for Unicode to change the defaults of
> existing Private Use characters.

Technically, Unicode has only defined the defaults for the code
positions, not for the "existing Private Use characters" that occupy
them.

Once you assign a character to a PUA code point, you have the right (and
IMHO the responsibility) to assign appropriate properties to it. (As
Peter Kirk points out, though, that doesn't magically endow software
with the ability to recognize those properties.)

> It would be slightly helpful if there
> was a means to include guidance as to which set of Private Use
> characters was being used. It would be quite a while before
> applications would be able to take advantage for such information,
> but it could serve as the basis of a long term solution.
>
> U+E0002 PRIVATE USE REGISTRY TAG
>
> might be a possible long-term solution that would enable
> such information to be stored in-band, altho give the lack
> of acceptance for in-band tags, the most I would expect
> from it would be to help define a common standard for tagging
> the set of private use characters in use that could be adopted
> by markup rather than the use of the tag characters themselves.

First, this will never ever happen, because it would turn the UTC into
operators of a meta-registry -- a registry of PUA registries -- and this
is the very antithesis of the UTC's approach to the PUA.

Second, you are correct that the Plane 14 tagging mechanism has met
with, to say the least, a lack of acceptance. I warned in 2002 that
deprecation or even "strong discouragement" of the Plane 14 language
tags would be equivalent to implicit deprecation of the entire Plane 14
tag concept. As it turns out, a lot of people think that's a good idea.

> TENGWAR DUODECIMAL DIGITS TEN and ELEVEN
> present an interesting problem. They are digits, but not
> decimal digits. Should the concept of General Category
> Nd be expanded to include non-decimal number systems?

No, the "d" stands for Decimal. This category is deliberately limited
to characters that can be concatenated to form numbers in a base-10
positional number system. It's a fact of life that base-12 and base-16
digits are relegated to category No.

> Or would
> E06A;TENGWAR DIGIT TEN;Nl;0;L;;10;10;10;N;;;;;
> be sufficient?

I think the General Category has to be No rather than Nl. Very few
characters are of type Nl -- just the Roman numerals, "Hangzhou"
numerals and Ideographic Zero, and Runic and Gothic letter-numbers.
Tengwar duodecimal digits aren't letters that got pressed into service
as numbers, they're just digits that happen to be base-12.

Also, of the three "10" values, you need to remove the first -- it's
only valid for characters with the decimal digit property (see
http://www.unicode.org/Public/UNIDATA/UCD.html for more details). The
other two are OK.

In summary, your listing should probably be:

E06A;TENGWAR DUODECIMAL DIGIT TEN;No;0;L;;;10;10;N;;;;;

Compare with the properties I assembled for my invented script, at
http://users.adelphia.net/~dewell/ew-props.html; in particular:

E6CA;EWELLIC HEXADECIMAL DIGIT TEN;No;0;L;;;10;10;N;;;;;

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: Antoine Leca: "Re: Croatian"
Previous message: Peter Constable: "RE: Romanian and Cyrillic"
In reply to: Ernest Cline: "Re: Defined Private Use was: SSP default ignorable characters"
Next in thread: John Cowan: "Re: Defined Private Use was: SSP default ignorable characters"
Reply: John Cowan: "Re: Defined Private Use was: SSP default ignorable characters"
Reply: C J Fynn: "Re: Defined Private Use was: SSP default ignorable characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Apr 28 2004 - 03:19:13 EDT