Re: Text and Anti-Text

From: Juliusz Chroboczek (jec@dcs.ed.ac.uk)
Date: Fri Aug 13 1999 - 14:00:17 EDT


"Dean A. Snyder" <dean.snyder@jhu.edu>:

DS> I propose that 10646 reserve the 32nd bit as a flag bit signifying text
DS> versus meta-text - when the bit IS NOT set, the glyph is to be treated as
DS> text with its current value in the standard unchanged; when the bit IS set,
DS> the glyph is to be treated as meta-text, having the same value as its
DS> non-bit-set counterpart in the current standard.

Please, don't.

The Unicode Standard clearly states that it defines 21 bit codes (if
memory serves), and that they will never go beyond this range. I
trust them. (I'm not clear about ISO 10646, but I have been led to
understand that neither will they.)

This is extremely important for implementations. If, for some reason,
I need to store characters one per word, and I'm using a 32-bit
architecture, I know that I can use 11 bits for my own purposes, and
know that I won't need to make any fundamental changes to the data
structures when the standard, I mean The Standard, is updated.

I'll give you an example. Many programming language implementations
use *tagged* data, i.e. keep data in a format that carries a /type
tag/ with every datum. For example, Lisp implementations use anywhere
between 2 and 8 bits for a tag. If I implement a Unicode-based Lisp
on a 32-bit machine, I might use 21 bits for the codepoint, 3 bits as
bucky bits (meta, super, hyper), and 8 bits for the tag. Or use 21
bits as the codepoint, 8 bits for the tag, and 3 bits for non-UCS
characters (a huge private private zone -- for the private use of the
implementation, rather than for the private use of the programmer).
Or else, 3 bits for the tag, 21 bits for the codepoint, and 8 bits for
style information.

But the actual choices are not important. What is important is that
once I commit to a choice, I can hardwire all the tag-processing code
for maximum speed without having to worry about what will happen when
the space of codepoints expands.

Please, do not break my deal with the Unicode Consortium. I trust
them to define the meaning of 21 bits for me -- but the remaining 11
are mine.

                                        J.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT