From: Doug Ewell (doug@ewellic.org)
Date: Sat Jan 03 2009 - 11:37:02 CST
James Kass <thunder dash bird at earthlink dot net> wrote:
> Private Use Area just means user-defined area.  There's nothing secret 
> or damaging about user-defined characters, whether they be suitable 
> potential candidates for standard plain-text, or whether they are 
> destined to remain banished in the phantom zone for all eternity. 
> There will always be people wishing or needing to exchange 
> user-defined material, and there's nothing wrong with that.  They are 
> using the PUA correctly.
There seems to be a school of thought that private-use characters are 
inherently evil and should never be used, except perhaps within one's 
own personal system.  The thinking seems to be that people will want to 
search for these things and interoperability will be broken, and also 
that "private agreement" implies a certain degree of secrecy and 
extremely limited use.
It seemed, at once, obvious and brilliant to me, around the 1993 time 
frame, that Unicode would provide a private-use area as part of its 
overall strategy to encode the most commonly used characters, but not 
just any old thing imaginable, so that users who wanted to use the 
Unicode architecture to represent any old thing imaginable could encode 
that thing as a private-use character.  I was not familiar with the East 
Asian encodings at the time and did not know that they also supported 
this useful mechanism.
Over time, the principle of "most commonly used characters" in Unicode 
expanded to include ancient scripts, musical symbols, and mathematical 
font variants, as well as just about every Han character that someone 
could dredge up instead of just the ones in existing standards.  But the 
PUA principle remained: you could still encode the Apple logo or Klingon 
or Ewellic in the PUA, and reap the benefits of the Unicode architecture 
without contaminating the Standard's repertoire.
At some point, perhaps with the rise of the Internet and powerful search 
engines, the idea began to spread that using PUA characters was always 
bad, because of the potential for conflict between different private 
agreements -- as if that possibility had not occurred to anyone before. 
I search for a document containing U+E690 and MegaFinder locates one for 
me, but my interpretation of U+E690 might differ from the one used by 
the author of the document.  The private agreement is not transmitted 
along with the document.  Supposedly this will cause great 
interoperability problems if I am not intelligent enough to understand 
that this is the nature of private-use codes.
This school of thought has also carried over to the BCP 47 language 
tagging arena, where people can create tags like "x-piglatin", whose 
meaning should be obvious even without a written and signed "agreement," 
and can also create "qaa" or "x-abc123", whose meaning would be far from 
obvious, and whose creator would have to be very naïve not to understand 
this.  Despite a serious lack of evidence that private-use tags are 
causing a mainstream interoperability crisis, successive versions of BCP 
47 have added more and more warnings against using them.
If you create an encoding standard of any sort, and include a 
private-use mechanism as a defense against having to encode every 
conceivable blob, and then turn around and discourage use of the 
private-use mechanism, the natural conclusion is that you will feel 
compelled to encode every conceivable blob.
-- Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Sat Jan 03 2009 - 11:39:49 CST