Re: Plan 14 Proposal

From: John Cowan (john_cowan@hotmail.com)
Date: Mon Aug 04 1997 - 14:57:54 EDT


ue to network problems, I can read mail at cowan@ccil.org, but
can't post/reply/send from there. Please direct all private replies
to cowan@ccil.org , not the HotMail address. Thanks.

This is a document I sent to the Unicode Technical Committee
for their consideration: it represents my views on Unicode
tagging. It endorses the Plane 14 proposal, but adds a
mechanism for people to create "public generic tags" that will not
collide with other people's generic tags.

Title: Plane 14 Characters for Generic Tags - Response
Source: John Cowan <cowan@ccil.org>
Primary Author: John Cowan (no affiliation)
Status: Expert contribution
Action: For the consideration of UTC
References: See end of this document
Distribution: UTC and elsewhere as appropriate

This contribution is meant to be read in conjunction with
"Plane 14 Characters for Generic Tags" proposal
from the UTC Working Group on Tagging and Annotation.

I urge the UTC to adopt the proposal with the following
substantive change, a drop-in replacement for the section
"Generic Tags":

Generic Tags

Generic tags are tags whose meaning is completely unspecified
by the standard. Generic tags may be used by cooperating parties
which have agreed on how to interpret the tag values they
contain. This agreement is subject to negotiation of a higher-level
protocol by the interoperating processes.

A generic tag is identified by prefixing a tag value with
U-000E0000 GENERIC TAG.

For example, two interoperating processes may agree to use
dinglesnort tagging, making use of the Plane 14 generic tags.
To embed a dinglesnort tag with the value "glop_8" in Unicode
plain text, the ASCII values of the string "glop_8" are converted
to Plane 14 tag characters, and GENERIC TAG is prefixed:

U-000E0000 U-000E0067 U-000E006C U-000E006F U-000E0070 U-000E005F
U-000E0038

The value of this string is expressed in whichever encoding form
is required (UCS-4, UTF-16, UTF-8, UTF-7) and embedded in text at
the relevant point.

Some organizations may wish to devise generic tags that are
guaranteed not to conflict with the generic tags assigned by
other organizations. Pursuant to this, generic tags whose first
character is a capital ASCII letter, called "public generic tags",
shall only be created according to the following conventions.

The creators of public generic tags first must have (or belong to an
organization that has) an Internet domain name, such as Sun.COM.
They then reverse this name, component by component, to obtain, in
this example, COM.Sun, and use this as a prefix for all their public
generic tags, using a local convention developed within the
organization to further administer tag names.

Such a local convention might specify that certain further prefixes
be division, department, project, machine, or login names.

The first component of a public generic tag is always written in
all-uppercase ASCII letters and should be one of the top-level
domain names, currently COM, EDU, GOV, MIL, NET, ORG, INT, or one of
the two-letter codes identifying countries as specified in
[ISO3166].

This convention for creating public generic tags is a way to
piggyback a tag selection convention on top of an existing widely
known unique name registry instead of having to create a separate
registry for tags. It is also the convention used for creating
package names for the Java language. Parts of this section
are derived from section 7.7 of [JLS].

Generic tags beginning with a lower case ASCII letter are reserved
for private use and will never be standardized. They are useful
where brevity is the most important consideration: such tags may
be as short as two UCS-4 characters.

Generic tags beginning with any other ASCII character are reserved
for future standardization.

For example, the public generic tag "ORG.ccil.cowan.glop_8" would be
represented as:

U-000E0000 U-000E004F U-000E0052 U-000E0047 U-000E002E
U-000E0063 U-000E0063 U-000E0069 U-000E006C U-000E002E
U-000E0063 U-000E006F U-000E0077 U-000E0061 U-000E006E
U-000E002E U-000E0067 U-000E006C U-000E006F U-000E0070
U-000E005F U-000E0038

Again, the value of this string is then expressed in whichever
encoding form is required and embedded in text at the relevant point.

*******************************************************************

References:

[JLS]
    The Java Language Specification, Version 1.0.
    by James Gosling, Bill Joy, and Guy Steele
    The Java Series
    ISBN 0-201-63451-1
    <http://www.javasoft.com/docs/books/jls>.

[ISO3166]
    ISO Standard 3166 (1981).

-- 
John Cowan                       cowan@ccil.org
        Please do not use "Reply"
        e'osai ko sarji la lojban.
______________________________________________________
Get Your Private, Free Email at http://www.hotmail.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT