Optimal Unicode 3.x Character Attributes and Access Methods
Ienup Sung - Sun Microsystems, Inc.
The Unicode Version 3.1 defines and encodes total 94,140 characters which is about 8.45% of the possible maximum number of characters the Unicode can represent. These characters are populated over rather wide span of the Unicode coding space and especially concentrated in the Plane 0, 1, 2, and 14.
Each and every character defined in the Unicode has various character attribute values like character classes, collation weights, and so on. These attribute values are quite frequently used by application programs. Due to this reason, most underlying platform software provide such character attribute values to upper layers of software through a set of programming interfaces as a supported feature functionality.
Even though computer hardware and software system resources are getting more economic every day, it is still necessary for the platform software to make and provide such functionality to upper layers of software in such a manner that the functionality will use minimum system resources and yet will be fast enough so that users of the functionality can achieve the best possible performance.
The goal of this technical presentation is to present several generic and also Unicode 3.x-specific data structures and access methods and provide comprehensive analyses and comparisons among them in both theoretical and empirical manners. The theoretical study provides best, average, and worst case system resource consumption and also execution time data in theory. The empirical study is based on actual measurements of system resource consumptions and execution times on various input data over several common hardware configurations for both client and server workstations.
|When the world wants to talk, it speaks Unicode|
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS).
GMS is pleased to be able to offer the International Unicode Conferences under an exclusive
license granted by the Unicode Consortium. All responsibility for conference finances and
operations is borne by GMS. The independent conference board serves solely at the pleasure
of GMS and is composed of volunteers active in Unicode and in international software
development. All inquiries regarding International Unicode Conferences should be addressed
Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.
22 Jun 2001, Webmaster