Text and Anti-Text

From: Dean A. Snyder (dean.snyder@jhu.edu)
Date: Fri Aug 13 1999 - 12:02:32 EDT


This is my first post to the Unicode list.

I am considering making a 10646 architectural proposal and have decided to
run it past the readers of this list first.

My proposal is conceptually very simple, but could have far-reaching
implications for the text markup, preservation format, and text programming

I propose that 10646 reserve the 32nd bit as a flag bit signifying text
versus meta-text - when the bit IS NOT set, the glyph is to be treated as
text with its current value in the standard unchanged; when the bit IS set,
the glyph is to be treated as meta-text, having the same value as its
non-bit-set counterpart in the current standard.

For example (in binary notation):
  textual "a" would continue to be 00000000 00000000 00000000 01100001,
  whereas meta-textual "a" would be 10000000 00000000 00000000 01100001.

(One aside - unlike the typical signed bit implementation for integer
representation which ends up with a single zero and an unbalanced number of
positive and negative integers, I would suggest that 00000000 00000000
00000000 00000000 represent textual nil/"zero" and 10000000 00000000
00000000 00000000 represent meta-textual nil/"zero".)

In my view this proposal has the following benefits:

1) There would no longer be any need in markup languages for complex escape
sequence schemas in which "magic" characters are reserved for delimiting
meta-textual information, thereby being disallowed in text itself, except,
typically, through some complex double-escapement mechanism.

2) Meta-text parsers can focus on parsing meta-text rather than additionally
implementing some complicated algorithm for separating out meta-text from

3) Parsing would be much faster.

4) There would be no limitations placed upon what could be represented as
text versus meta-text, since they would be "exact" duplicates of one
another. (I like what one commenter said at the ACH/ALLC conference this
summer, "I like it - it's sort of like matter and anti-matter, text and

5) From a universal preservation format point of view this would greatly
simplify markup algorithms, making them easier to decode and re-implement.


Dean A. Snyder

P.S. For those who would object that "10646 is a 31-bit standard", I would
only ask why then it has entertained in the past other proposals regarding
this 32nd bit?

Dean A. Snyder
Senior Information Technology Specialist
The Johns Hopkins University
Hopkins Information Technology Services
Research and Instructional Technologies
18 Garland Hall, 3400 N. Charles St.
Baltimore, Maryland (MD), USA 21218

Office: 410 516-6021
Mobile: 410 961-8943
Fax: 410 516-5508
Email: dean.snyder@jhu.edu

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT