wchar.h, wctype.h question

From: G. Adam Stanislav (adam@whizkidtech.net)
Date: Wed May 05 1999 - 13:20:29 EDT


Can someone help Adam? This one is beyond my ken.
ME
=============
Hello Michael,

I am coordinating the development of the FreeBSD (Unix) implementation of
that section of the C library which is defined in <wchar.h> and <wctype.h>.

I am a bit confused about the planes in ISO-10646. Where on the web can I
find a description of these planes?

Also, I am looking for algorithms to implement the iswctype and towctrans
functions. In the traditional C library, the ctype functions such as
isdigit are usually implemented via a 256-byte lookup table, with
individual bits either set or not set.

Obviously, this cannot be done (not at present, anyway) with the wide
31-bit characters of ISO-10646 since it would require a table of 4 Gigabytes.

Are there any algorithms for the implementation of these functions that I
should be aware of before trying to reinvent the wheel?

What I have been thinking about doing in this area is creating several
separate tables for each type which would simply list all 31-bit codes that
represent that type, and do so in sorted order.

For example, a table of all wchars that represent a digit would allow me to
perform a quick search to see if any value is inside the table. If it is,
the function can return TRUE, otherwise it will return FALSE. Alas, this
seems an imperfect solution as there is no way of knowing what future
extensions will be added to ISO-10646 (I have seen quite a number of
proposals for such extensions on your web site and there, no doubt, will be
more).

Is there a better way? Is there a system to this? What I mean is, is there
some way of knowing that if for example a specific bit in character code is
set, it is a digit? Or if another bit is set, it is an alphabetic letter?

I would appreciate any suggestions that may help me and other FreeBSD
developers to add this functionality to our C library.

Thank you,

Adam



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT