There exists at least two package that can show you how to do...
Following is an old message on this list with URLs for these two packages.
Perhaps, John Cowan and Mark Leisher could provide you with a more detailed
Just a simple question, do you plan to implement "wchar_t is UCS-4" ?
From: John Cowan [SMTP:email@example.com]
Sent: June 16, 1998 11:30 PM
To: Unicode List
Subject: uctype: Unicode ctype-style package
After a six-month detour through XML and Java and related matters,
I have finally cleaned up and released uctype 2.0, a ctype-style
C/C++ character classification package for Unicode using compiled-in
tables. I have upgraded to Unicode 2.1 in the process, but Unicode 2.0
folks will probably not have any real problems.
I have also made the API source-compatible with Mark Leisher's
excellent ucdata package, which stores its tables in external
(binary) files. There are minor differences: uctype supports
Unicode numeric values, ucdata supports decomposing precomposed
characters. A man page is included with uctype which (by intention)
documents both packages equally.
Download uctype at http://www.ccil.org/~cowan/uctype-2.0.tar.gz ;
ucdata is available at
-- John Cowan http://www.ccil.org/~cowan firstname.lastname@example.org You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
On May 05, 1999 7:15 PM, G. Adam Stanislav [SMTP:email@example.com] wrote: > Can someone help Adam? This one is beyond my ken. > ME > ============= > Hello Michael, > > I am coordinating the development of the FreeBSD (Unix) implementation of > that section of the C library which is defined in <wchar.h> and <wctype.h>. > > I am a bit confused about the planes in ISO-10646. Where on the web can I > find a description of these planes? > > Also, I am looking for algorithms to implement the iswctype and towctrans > functions. In the traditional C library, the ctype functions such as > isdigit are usually implemented via a 256-byte lookup table, with > individual bits either set or not set. > > Obviously, this cannot be done (not at present, anyway) with the wide > 31-bit characters of ISO-10646 since it would require a table of 4 Gigabytes. > > Are there any algorithms for the implementation of these functions that I > should be aware of before trying to reinvent the wheel? > > What I have been thinking about doing in this area is creating several > separate tables for each type which would simply list all 31-bit codes that > represent that type, and do so in sorted order. > > For example, a table of all wchars that represent a digit would allow me to > perform a quick search to see if any value is inside the table. If it is, > the function can return TRUE, otherwise it will return FALSE. Alas, this > seems an imperfect solution as there is no way of knowing what future > extensions will be added to ISO-10646 (I have seen quite a number of > proposals for such extensions on your web site and there, no doubt, will be > more). > > Is there a better way? Is there a system to this? What I mean is, is there > some way of knowing that if for example a specific bit in character code is > set, it is a digit? Or if another bit is set, it is an alphabetic letter? > > I would appreciate any suggestions that may help me and other FreeBSD > developers to add this functionality to our C library. > > Thank you, > > Adam
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT