"Dean A. Snyder" <email@example.com>:
DS> I propose that 10646 reserve the 32nd bit as a flag bit signifying text
DS> versus meta-text - when the bit IS NOT set, the glyph is to be treated as
DS> text with its current value in the standard unchanged; when the bit IS set,
DS> the glyph is to be treated as meta-text, having the same value as its
DS> non-bit-set counterpart in the current standard.
The Unicode Standard clearly states that it defines 21 bit codes (if
memory serves), and that they will never go beyond this range. I
trust them. (I'm not clear about ISO 10646, but I have been led to
understand that neither will they.)
This is extremely important for implementations. If, for some reason,
I need to store characters one per word, and I'm using a 32-bit
architecture, I know that I can use 11 bits for my own purposes, and
know that I won't need to make any fundamental changes to the data
structures when the standard, I mean The Standard, is updated.
I'll give you an example. Many programming language implementations
use *tagged* data, i.e. keep data in a format that carries a /type
tag/ with every datum. For example, Lisp implementations use anywhere
between 2 and 8 bits for a tag. If I implement a Unicode-based Lisp
on a 32-bit machine, I might use 21 bits for the codepoint, 3 bits as
bucky bits (meta, super, hyper), and 8 bits for the tag. Or use 21
bits as the codepoint, 8 bits for the tag, and 3 bits for non-UCS
characters (a huge private private zone -- for the private use of the
implementation, rather than for the private use of the programmer).
Or else, 3 bits for the tag, 21 bits for the codepoint, and 8 bits for
But the actual choices are not important. What is important is that
once I commit to a choice, I can hardwire all the tag-processing code
for maximum speed without having to worry about what will happen when
the space of codepoints expands.
Please, do not break my deal with the Unicode Consortium. I trust
them to define the meaning of 21 bits for me -- but the remaining 11
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT