From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Sat Feb 15 2003 - 07:00:49 EST
I was interested to read the comment by Rick McGowan.
Thank you for your note. I found the MARC system described at the following
place on the web.
It is very interesting and I have started to read about it.
I looked back at what I had written and found the following.
Books in libraries are often classified with a code consisting of digits and
a full stop character. For example, the number 515.53 is on a label which
is still on the spine of a book which I bought in a sale of withdrawn books
from a library. So, if U+E0002 were used to introduce a tag for the library
book classification code, then a sequence starting with U+E0002 and using
some other tag characters could be used to classify the subject matter of
any document which is stored in computerized form.
I also found the following about the Dewey Decimal Classification system.
I realize in rereading what I wrote in the light of the comment by Rick that
I may well have not expressed my meaning correctly.
My intention was to convey the meaning of the type of use as in the
Suppose that there is a plain text document written in Cyrillic script. If
at the start of that document there is a U+E0001 character then some tag
characters indicating the language and then a U+E0002 character and then the
characters U+E0036 U+E0030 U+E0038 then someone could look at the document
using a suitable computer system and find out from the few plane 14
characters at the start of the document in which particular language the
document is written and also that the general topic area of the document is
inventions and patents. This being because 608 is the Dewey Decimal
Classification for inventions and patents. However, in an ordinary document
viewing package, the tags would not be displayed, so they would not get in
My suggestion about using International Standard Book Numbers with a tag
type code, which could perhaps be U+E0003, perhaps needs looking at further.
Does the tag code mean "This is the start of the text of the book with the
following ISBN number" or does it mean "Here is a reference to an ISBN for a
book to which I am referring"? Can the two meanings be distinguished,
perhaps by putting a tag R after the U+E0003 and in front of the tag digits
for a reference to the book and not using a tag R if the use is at the start
of the text of the electronic book itself? Or how? There are possibilities
for progress here, provided that tags are continued, on the basis of being
reserved for use in particular protocols, and provided that the Unicode
Technical Committee is willing to consider the defining of additional tag
types at some time in the future.
My suggestion for U+E0004 could be very useful. Suppose that the haiku
which I included at the end of the document had an International Literary
Work Number, if such a system of International Literary Work Numbers comes
into existence in the future. I could produce a plain text file which
starts with U+E0004 and a number of tag characters and then the text of the
haiku. I could place that file somewhere on the web. Search engines might
log it. If then someone is writing an article about the topic of poetry and
Unicode, then he or she might refer to that haiku and include a tag encoded
reference to it, using its International Literary Work Number. A reader of
that document could decide to have a look at the text and could then search
the internet for the text of the haiku, knowing that the search is made
easier due to the fact that the International Literary Work Number is unique
to that haiku, whereas searching for Phaistos Disc might not find it at all,
or might find it as but one of many search engine matches for the term
All of these things and maybe many more will be possible if tag characters
are not fully deprecated and the possibility of defining more tag character
In my posting I wrote the following.
Perhaps all of plane 14 needs to be declared an area considered as
deprecated in general terms, yet where codes for use with particular
protocols can be defined by the Unicode Technical Committee, so that the
potential for using such futuristic developments and encoding them within
the Unicode framework is preserved?
I feel that that is the way forward. In some ways it would be a compromise,
yet it is more than a compromise, it is a far-reaching forward-looking
policy option which would both protect the present mainstream use of Unicode
whilst also providing for futuristic possibilities within the context of
conveying information in Unicode compatible files in a precise,
formally-defined manner. At present, characters are either regular Unicode
codes or Private Use Area codes. This could be changed so that characters
are either regular Unicode codes, or reserved Unicode codes or Private Use
Area codes, with reserved Unicode codes all being in plane 14.
15 February 2003
This archive was generated by hypermail 2.1.5 : Sat Feb 15 2003 - 07:56:11 EST