Re: "uctype.h": a Unicode-based character classification API

From: John Cowan (cowan@locke.ccil.org)
Date: Wed Feb 11 1998 - 16:58:35 EST

Next message: Bernard CHOMBART: "Re: European Currency Symbol"
Previous message: Mark Leisher: ""Unidata 1.0" is now "Ucdata 1.0""
Maybe in reply to: John Cowan: ""uctype.h": a Unicode-based character classification API"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Tom Garland scripsit:

> > As I am dissatisfied with the constraints that the POSIX "ctype.h"
> > classification API puts on characters, and its over-specificity
> > compared with the Unicode model,
>
> I'd be interested in seeing a critique of ctype.h if you have one
> or can point me to one.

Well, let's see. Posix "ctype.h" knows but two cases, whereas Unicode
knows three. In Posix, only European Arabic digits can pass "isdigit",
whereas Unicode has many sets of digits, all putatively equal.
In Posix "ctype.h", that which is "alnum" but not "alpha" must be a
"digit", but Unicode is aware that not all numbers are digits, nor
are all letters alphabetic. Unicode groks spacing and non-spacing
marks, but Posix comprehends them not.

<p lang=en_TH>
Etcetera, etcetera, etcetera.
</p>

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (FW 16.5)

Next message: Bernard CHOMBART: "Re: European Currency Symbol"
Previous message: Mark Leisher: ""Unidata 1.0" is now "Ucdata 1.0""
Maybe in reply to: John Cowan: ""uctype.h": a Unicode-based character classification API"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT