Re: Chess symbols, ZWJ, Opentype and holly type ornaments.

Date: Thu Jun 20 2002 - 02:48:40 EDT

Barry Caplan wrote:

>> Please stop bothering the list with your PUA plans: nobody cares --
>> that's what the "P" means.
> Is that is the consensus here? .... Would it be nice to have a place
> to publicize PUA uses so that the codepoints can be used? Something
> like CPAN for Perl perhaps? Otherwise it seems like a lot of wheels
> might be reinvented...
> I do agree that maybe this list isn't the best place to announce such
> stuff, esp. if it gets to be high volume, but it seems like useful
> information to collect and disseminate to me....

There are two issues here. One is that, no, the list isn't the best
place to plop a 2,000-word essay explaining some ideas you are mulling
around in your head. It's better to upload something like that to a Web
site and post a relatively brief announcement to the list, so interested
people can have a look.

William did just that on Tuesday in his message "Chess fount code points
(Private Use Area) now published." and I for one appreciated it.

There's another issue, though, and that is the content. William's
essays have been about using the Private Use Area to encode ligatures,
glyph variants, ornamental borders, and other things that simply do not
belong in a character encoding framework, public or private. They are
problems to be solved with markup or with existing Unicode mechanisms
(like ZWJ for ligatures), not by allocating new characters. A little
familiarity with the basics of character encoding in general, and
Unicode policies in particular, would do a world of good here.

To make matters worse, we keep seeing the same type of proposals over
and over again, despite the repeated advice of top Unicode experts and
insiders that the proposals are misguided and inappropriate. There are
continued references to these PUA creations being "promoted" to Unicode,
and continued musings about the Unicode Technical Committee
"reconsidering" or "revisiting" decisions that are at the very core of
its policies.

Advice is not being heeded and lessons are not being learned. That is
what leads someone like Christopher to write "Please stop bothering the
list" -- no other wording appears to be strong enough.

Now, I may be on somewhat shaky ground in criticizing William's PUA
proposals for being outside the mainstream. I have a few unorthodox
opinions of my own. In particular, I happen to like the Plane 14
language tags and feel they should not be deprecated quite as strongly
as they are. I agree with the Unicode Standard that language tagging is
usually not necessary and that many higher-level protocols offer better
mechanisms for handling it. Even so, there are cases (rare though they
may be) where language tagging in plain text can be beneficial, and I
feel it is wrong for Unicode to offer me a solution and then tell me
that I must never use it, except in the presence of "special protocols"
which are never identified. I think Plane 14 language tags should be
available for responsible adults to use WHEN they are appropriate and
WHEN there is no higher-level protocol like XML to handle out-of-band

I'm also a fan of SCSU, which, although published as a Unicode Technical
Standard, doesn't seem to get much respect or support from the Unicode
cognoscenti (Markus Scherer being a notable exception). I keep hearing
that its stateful design and use of sub-0x20 bytes make it unsuitable
for all kinds of purposes, and that general-purpose compression methods
are more efficient and better-supported. The fact is, SCSU is a
surprisingly effective compression technique that requires very little
overhead, is easy to decode, and is not tied to any single vendor's
implementation. But sometimes I think I'm the only one who feels that

And, of course, I've created (and described on the list) several
experimental Transformation Formats and shared my experiences with
others who have done the same. This annoys some UTC members who feel
that this is not an appropriate activity or worry that people are likely
to confuse the "real" UTFs with the experiments.

So I'm not totally in the Unicode mainstream. But at least I like to
think I am listening to the advice of others and that I don't bombard
the list repeatedly with the same messages.

The "consensus," if any, seems to be that it is the *combination* of
repetitive, overly verbose essays about topics contrary to the
principles of Unicode that is annoying list members.

