RE: Tags and the Private Use Area

From: Marco Cimarosti (
Date: Thu May 03 2001 - 06:09:14 EDT

William Overington wrote:
> Kenneth Whistler wrote:
> > Among other things, you have yet to have meet the challenge
> > by Michael Kaplan to provide a convincing case for their
> > requirement.
> Oh, there was no need. Michael stated his challenge as a
> "put up, or shut up" challenge [...]

I am probably not the only one who feels that this is not the way of
discussing things.

I think that Michael did not state any "put up or shut up" challenge, but
rather made a very sensible objection to the whole subject of this thread:
"what is it for?". I think that such an objection should be answered
politely, rather than haughtily refused.

So, if William drops it, I will take the challenge -- at the risk of
repeating things that others and myself already wrote.

The PUA is (or might be) used for, e.g.:

1) linguistic research (e.g. handling texts in unencoded or unencodable
ancient scripts);

2) recreational linguistics (e.g. constructed scripts and the like);

3) encoding research (e.g. experimenting with interim encodings while
preparing proposals);

4) orthography development (e.g. special characters experimented for as-yet
unwritten living languages);

5) interim encodings for non-linguistic notations (e.g. people who need
labanotation to discuss dance over the Internet).

Of course, every single person can be involved in more than one project for
each one of the disciplines above, e.g.: a scholar may study (or teach) both
hieroglyphics and cuneiform; a "game master" may be discussing several role
games in ConLangs mailing lists, etc.

After a few years surfing in linguistic-related forums, I noticed that the
same names tend to occur in disciplines 1 to 4, and I wouldn't be surprised
if some of these people is also interested in point 5.

All this is to say that, yes, there may be a latent need for exchanging PUA
encodings and, consequently, to define some sort of protocol to attach the
intended meaning to the otherwise meaningless PUA codepoints.

Such a protocol can be private or public but, clearly, it *cannot* be a part
of the Unicode Standard, because this would contradict the basic statement
that everybody can do whatever they want with PUA .

One can imagine that, in a distant future, Unicode could choose to
"reference" such a protocol as a "related information", but no more.
However, before such a thing can happen, there must be something to be

I think that the discussion is currently focusing the wrong thing. It is not
so important how a certain text file will declare its "PUA semantics": after
all, there will never be *one* method for doing this (text who has a MIME
header will presumably use it; rich text will have its own means; mark-up
languages may add a tag for this, etc.).

IMHO, It would be more interesting (and less impacting Unicode policies) to
discuss *what* this "PUA semantics" data could look like. Will it be a
UniData-like file? Or will it be an XML-based file? Will it include a
default font? Which kind of font? And how will all this material be used:
will programmers manually download it and package it in their applications?
Or will it be automatically downloaded and installed la plug-in?

Let me add that, however, all this subject is *not* exactly the
highest-priority need that I ever heard. I personally can live even with and
"undefined PUA", and wouldn't spend my time in developing such a thing. If
someone else wishes to start such a work, I would certainly try to keep
myself informed about their progress -- but I would not like to follow
*every* single step of the discussion on *this* mailing list.

_ Marco

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:16 EDT