Kenneth Whistler made the following comments.
> You are, of course, free to go off and invent such schemes, but this
> is not Unicode, nor is it plain text. It also violates most modern
> practice in software design, which avoids such mixing of layers in favor
> of clear, layered and modular design.
Well, separating data out into a collection of files rather than all of the
data being in one file is often useful, yet for some purposes that is not,
in my opinion, the best way to go.
Consider for example this matter of analyzing cuneiform tablets, where the
idea is to have a three-dimensional scan of the clay tablet available on a
computer and to transcribe the information on the tablet into Unicode
characters that represent the cuneiform characters.
I feel that it would be helpful to be able to encode the graphical data that
represents the cuneiform tablet itself and the Unicode characters that
represent the text on the tablet all together in one file. This would allow
researchers to draw a closed loop around a particular area on the surface of
the tablet and attach a Unicode character to the loop, so that the idea is
that the wedge marks inside the loop and the Unicode character represent the
same cuneiform character. There could be loops within loops so that a run
of characters in the clay could be represented as a run of Unicode cuneiform
characters. My suggested system, and I am already extending it so that
codes from hexadecimal 400000 to hexadecimal FFFFFD have different meanings
if the final two bits are 00 or 01 by having different processes to be
obeyed, will hopefully allow that sort of computing effect to be encoded all
in one file in a straightforward manner. I am thinking about how to achieve
the various effects. I am strongly influenced by the way that the encoding
system for PNG format graphic files works, where there can be chunks within
a graphics file that can contain text information. However, the text in PNG
files annotates the picture as a whole, not individual parts of the picture,
yet it is a nice design and it influences my thinking.
The PNG specification is available at the following place on the web.
An interesting analogy that might be useful can be found by looking at the
At the bottom of that page is a graphic that I drew in 1998, for a purpose
nothing whatsoever to do with cuneiform encoding, yet which graphic may be
helpful as a way of explaining what I mean.
Suppose, for example, that the eight wedge shapes in that picture were each
indicative of a particular letter of the English alphabet and that it were
desired to annotate those wedges in the picture with an assessment of which
particular letter of the English alphabet were being indicated together with
information as to the order in which the letters were to be read, so that
some other researcher would be able to review the decisions made as to which
letter is represented by each wedge shape and to review the decisions made
as to the order in which the letters were to appear. How would it best be
done? Now this analogy example has just eight wedges where it is one wedge
shape for each letter: a clay tablet will have lots more wedge marks and
there may well be many wedge marks for some characters. I suggest that my
idea for an integrated system where 24 bit codes are used and any that are
in the range 000000 through to 01FFFF are used for Unicode characters is
just the sort of system that is needed for this type of research. Now, I
would be entirely happy to learn of any other methods, or any better
methods, of achieving this type of computational capability, I am not saying
that this must necessarily be the only way or the best way. However, if an
encoding system such as I suggest were used, then the presence of a Unicode
character could be easily detected and then processed using software that
processes Unicode characters in standard ways, whilst 24 bit codes that are
not Unicode characters could be processed using appropriate specially
written software routines. I have in mind the possibility of a Java program
being used to process the whole file. Thus each clay tablet would have one
main file, though smaller files carrying the image data and text characters
for parts of the tablet could be produced by separating out some of the data
from the main file, using the loops referred to earlier as a guide to the
separating-out process. If various members of a research team each worked
separately on a portion of the tablet, using such separated-out files, they
could each add their transcriptions to the separated-out file. When each
team member had completed his or her transcription, a new file could be
produced by joining all of the files together, including in the joining
process a file containing those items of clay remaining when the smaller
files were separated out. This seems to me a very exciting development in
the way to process graphical information. Hopefully this idea will be very
effective in research on cuneiform tablets.
I am thinking of including in this encoding system facilities for expressing
rotations in quaternion format. This could potentially be very useful so as
to use quaternions to express the position and orientation of the stylus
used for making the wedge shaped depressions in the clay tablet.
Quaternions are interesting to use for rotations. There are some items
about quaternions in the http://www.users.globalnet.co.uk/~ngo webspace.
These files include some links to demonstrations that run using a Java
enabled browser. It is not necessary to read all of the mathematics and
computation in the above files in order to enjoy the demonstrations. Some
of the demonstrations involve the rotation of a cube, which, although not
prepared with cuneiform writing in mind, do perhaps give an idea of the way
that quaternions could be used to represent the orientation of a
square-sectioned stylus with respect to the surface of a clay tablet.
I am also thinking as to what other applications that involve both graphics
and text might benefit from this type of encoding. Naturally, it is not
Unicode, yet it can apply Unicode character representations within it in
order to encode characters.
> The days when assembly programmers had to reuse bits and write
> self-modifying code are long gone (except, I suppose in chip microcode) --
> blown away by the hardware advances that have made memory and storage
> resources orders of magnitude cheaper than the costs of software
> development and maintenance.
Well, some of the scans for these cuneiform tablets are in the range of a
hundred megabytes for each tablet, so memory on a hard disc is still
something that needs to be minimized where this can be done without loss of
I had previously written the following.
>> Yet I feel that such application possibilities to be able to use Unicode
>> characters in conjunction with graphic data with everything encoded
>> together in an open format file are an important possibility for the
Ken commented as follows.
> Um. Have you heard of HTML and XML?
Yes, I have heard of both. I have used HTML to some extent, though not to
an advanced level, and I have not used XML at all as yet. However, I am
now, having seen that excellent demonstration using a cuneiform tablet that
I mentioned previously, trying to learn to use the Viewpoint Experience
Technology system and that uses XML a lot so maybe I can learn about it.
30 April 2002
This archive was generated by hypermail 2.1.2 : Tue Apr 30 2002 - 13:21:21 EDT