Nineteenth International Unicode Conference

A Generalized Mechanism for Unicode Metadata

Steven Atkin - IBM Corporation & Ryan Stansifer - Florida Tech

Intended Audience:	Software Engineer, Systems Analyst
Session Level:	Intermediate

The many competing motivations for selecting codepoints in the Unicode standard threaten the supreme purpose of a character encoding: data. Digital data is immensely conve-nient because the advantages of its great simplicity outweigh the loses incurred by representing knowledge imperfectly. Increases in computing power permit us to begin recovering what has been left out. Yet the very richness of the collection of Unicode characters has made the interpretation of text more difficult. Algorithms and reports are necessary now to understand raw streams of Unicode characters.

We propose a general mechanism for conveying metadata within Unicode. The conceptual boundary between codepoints and text processing is sharpened. The approach is both flexi-ble and extendable. Furthermore, algorithms such as the bidi-rectional algorithm can be recast in such a way that they become detectable and reversible.

When the world wants to talk, it speaks Unicode

International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

22 Jun 2001, Webmaster