From: Theo Veenker (Theo.Veenker@let.uu.nl)
Date: Tue May 04 2004 - 06:09:07 CDT
At this time there are about 160 different character properties defined
in the UCD. In practice most applications probably only use a limited set
of properties to work with. Nevertheless applications should be able to
lookup all the properties of a code point. Compiling-in lookup tables for
all defined properties (including Unihan) makes small applications become
rather big. This made me decide to create a binary file format for storing
character properties and initialize property lookup tables on demand.
Benefits of using run-time loadable lookup tables initialized from binary
- no worries about total table size, since data will only be loaded
- initializing lookup tables from a binary file is relatively fast
- property lookup files can be locale specific (useful for character
names and case mappings for example)
- new properties can be added quickly and never affect layout or
content of other tables
- any number of properties can be supported including custom
- by initializing a lookup table from two sources (UCD-based and
vendor-based), applications can overload the default property
values assigned to PUA characters with private property values
The file format I've implemented is capable of storing any type of property.
Each file contains property values for one property (no more squeezing as
much property values as possible in as few bits as possible). The format
is called UPR (Unicode PRoperties).
I have written a tool to generate the necessary UPR files from the UCD. A
small C-library for reading a UPR file into a property lookup table, and
a high-level library which provides property lookup functions for *all*
Unicode properties in 4.0.0 are also available.
For more information on the file format and related software see:
http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/. My primary
development platform is UNIX/Linux, but you can compile and run it under
Windows as well (less tested however). Current version supports UCD 4.0.0,
I will add support for 4.0.1 soon.
Please check it out. Feedback is welcome.
This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:25 CDT