A Generalized Mechanism for Unicode Metadata
Steven Atkin - IBM Corporation & Ryan Stansifer - Florida Tech
The many competing motivations for selecting codepoints in the Unicode standard threaten the supreme purpose of a character encoding: data. Digital data is immensely conve-nient because the advantages of its great simplicity outweigh the loses incurred by representing knowledge imperfectly. Increases in computing power permit us to begin recovering what has been left out. Yet the very richness of the collection of Unicode characters has made the interpretation of text more difficult. Algorithms and reports are necessary now to understand raw streams of Unicode characters.
We propose a general mechanism for conveying metadata within Unicode. The conceptual boundary between codepoints and text processing is sharpened. The approach is both flexi-ble and extendable. Furthermore, algorithms such as the bidi-rectional algorithm can be recast in such a way that they become detectable and reversible.
|When the world wants to talk, it speaks Unicode|
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS).
GMS is pleased to be able to offer the International Unicode Conferences under an exclusive
license granted by the Unicode Consortium. All responsibility for conference finances and
operations is borne by GMS. The independent conference board serves solely at the pleasure
of GMS and is composed of volunteers active in Unicode and in international software
development. All inquiries regarding International Unicode Conferences should be addressed
Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.
22 Jun 2001, Webmaster