Re: Tags and the Private Use Area

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Thu Apr 26 2001 - 11:47:03 EDT


I have updated my suggestion. Here is the latest version for discussion.

Let there exist the idea that there is U+100002 (PUA INTERPRETATION TAG) and
a set of private use area tag characters (U+100020 .... U+10007F) all of
which code points are in the upper private use area.

May I suggest that mention is made that, where displayed for analysis
purposes, these private use area tags should be displayed as yellow on a red
background. Ordinary unicode tags displayed for analysis are not specified
to be displayed in any specific colour but some people might like to display
them as white on blue so as not to conflict visually with these suggested
private use area tags.

Naturally this definition within the private use area is not an absolute
definition and the Unicode Consortium is not being asked to endorse it nor
would they, by their own statement. All that could be reasonably sought is
that the practice and such protocols that are expressed using such private
use area tags are so well thought out and designed by interested users that
most users will wish to use them for most applications where private use
area characters are used. It cannot be expected that most users will agree
to such a system, yet one can always hope.

Specific protocols to use with such tagging can be devised.

I put forward the idea that, in a file of plain unicode text that contains
characters from the private use area, information about the character set or
sets to which private use area codes refer may, if so desired, be included
within the file (before the use of any character to which the information
relates) by including the U+100002 character followed by a number of private
use area tag characters from the set of private use area tag characters
(U+100020 .... U+10007F) which express one or more groups of characters in
the following formats.

A Uniform Resource Locator of a font file.

For example,

http://www.somewebsite.net/oldchem.ttf

A Uniform Resource Locator of a description file of the characters within
square brackets.

For example,

[http://www.somewebsite.net/oldchem.htm]

A comment about the characters in natural language within wavy brackets.

For example,

{Symbols used in early chemistry}

A list specifying the parts of the private use areas to which this
description refers within round brackets.

For example,

(E000..E2FF,E700..E7FF)

The name of the font to be used is always expressed as a full Uniform
Resource Locator using the private use area tag codes, though a software
package using the data may, if it wishes to take the risk, simply use the
file name at the very end of the said Uniform Resource Locator and search
for that file name in its own local font directory without accessing the
internet.

The suggestion is open for discussion and I hope to gain fairly widespread
agreement within the unicode user community.

William Overington

26 April 2001



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT