Date: Fri Feb 23 2001 - 09:39:26 EST

On 02/22/2001 09:04:14 PM "Joel Rees" wrote:

>Actually, what I _think_ I am after is a standard way to let the standard
>committee(s) escape having to deal with ideosyncratic and personal
>characters until they actually come into general use.

Can anybody spell "PUA"? There are 137, 068 (64K * 2, less 4 for xxFFFE and
xxFFFF, plus 6000 in the BMP) such codepoints available for exactly this.
If anyone needs more than that for idiosyncratic or personal characters,
then they should create their own systems because they're doing something
so obscure that it's not reasonable to expect others to invest in creating
systems to support it.

>I know you can't foresee breaking past 17 planes, but remember that ten
>years ago it was still a little hard to imagine GB hard drives being on
>everyone's desktops.

That comparison is like apples and oranges.

>I'm telling you that 17 planes is not enough, and it _will_ become a
>constraint in your lifetime.

And Ken's telling you - as would I - that you're wrong. And it's pretty
clear that in this argument the onus is on you to prove your case: The
Unicode codespace can support 1,111,998 characters (64K * 17, less 2*17 for
xxFFFE and xxFFFF, less 32 for FDDO-FDEF, less 2048 for the surrogates). If
we discount the PUA space, that's 974,930. As of TUS3.1, there are 94,140
encoded characters. That leaves 880,790 yet to be assigned. Put another
way, of the assignable codepoints, over 90% (.9034...) are still available.
That's after nearly one and a half decades of work. Remember, too, that the
94,140 characters were generally in wide use and well known during that
time by the parties involved. So, you've got to come up with a convincing
argument that there are more than 880,790 characters that are going to be
in wide use and that will merit standardization within any of our
lifetimes. The unlikelihood of you or anybody coming up with sufficient
evidence to make that case is such that I'd be willing to put less
constraint on you: present clear evidence that more than 880,790 characters
will *ever* be in wide use and will merit standardization on this planet.

>Maybe I'm a crackpot,

If so, you're in plenty of company here. :-)

>but the need is there and people will use and abuse
>UNICODE in ways that you probably don't want to imagine. What I'm trying
>push is building the mechanism now for dodging most of the abuse.

The mechanism exists. It's called the Private Use Area. Everybody go and
play in that sandbox to your heart's content. If you do things in their
that are indecent, then don't go telling the rest of us about it.

- Peter

Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <>

