"uctype.h": a Unicode-based character classification API

From: John Cowan (cowan@locke.ccil.org)
Date: Tue Feb 10 1998 - 17:42:01 EST


As I am dissatisfied with the constraints that the POSIX "ctype.h"
classification API puts on characters, and its over-specificity
compared with the Unicode model, I have built and tested an
alternative API known as "uctype.h", and I am now releasing the
source code for Version 2.0. (Version 1.0 was differently
conceived and never made it out the door.)

The API still maintains the flavor of "ctype.h", but allows access
to every property in the Unicode character database
(UnicodeData-Latest.txt) except the name and decomposition properties.
There are 39 uct_is* selectors, plus uct_getbidi, uct_getclass,
uct_getnumber, uct_getdigit, uct_toupper, uct_tolower, and uct_totitle
functions. Nonetheless, only about 6-7 kilobytes of data space
are required.

It may be reckoned an advantage or a disadvantage that this package
is stand-alone, and not part of a more complex application framework.

The code is written in ISO/IEC C and is highly portable, having
essentially no dependencies on environment. A suite of programs
in C and Perl are provided for those who wish to add their own
characters or change existing properties; the suite tests that all the
properties are consistently provided.

The code is released under an MIT/X-Consortium license: it is free
for any use whatever, proprietary or not, and may be freely modified
by anyone provided the copyright notice is preserved.

O Sarasvati: I would love to see this code in a
/Public/SOFTWARE/CONTRIB subdirectory. Is this possible?
If so, tell me where and how to upload it. If not, I will post an
URL in due course.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (FW 16.5)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT