Re: Character properties

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Mon Oct 23 2000 - 14:15:44 EDT


Mon, 23 Oct 2000 09:48:52 +0100, Marco.Cimarosti@icl.com <Marco.Cimarosti@icl.com> pisze:

> > isDigit: Nd
> > isHexDigit: '0'..'9', 'A'..'F', 'a'..'f'
> > isDecDigit: '0'..'9'
> > isOctDigit: '0'..'7'
>
> The definition "Nd" is what I would have proposed for isDecDigit.

The name isDecDigit is confusing indeed... isAsciiDigit?
But it would be inconsistent with the rest.

> In general, I would consider any script's digit for decimal and octal
> numbers. Not so for hex numbers, that are probably strictly bound
> to computer programming languages and, hence, to the Latin script.

Octal digits are bound to programing languages as much as hex digits.
I'm not sure about names of Nd and '0'..'9', but I think that there
is no need for separate Nd-less-than-8 and '0'..'7', with '0'..'7'
being enough - it is used in programming languages and formats with
C-like string escapes.

> What is the meaning of isDigit? The intuitive meaning would be "Any
> kind of digit, as defined by the three specific functions below".

Any kind of digit which forms numbers in the positional decimal system,
convertible to an integer by the standard function digitToInt.

Actually digitToInt also understands 'A'..'F' and 'a'..'f' as hex
digits.

> So, I would say:

This does not provide any name for '0'..'9'. Nor for '0'..'9' +
'A'..'F' + 'a'..'f'. Since they are commonly used in existing formats
and programming languages, I'm afraid it's not enough. OTOH there
should not be too many variants that nobody will use.

> > isUpper: Lu, Lt
> > isLower: Ll
>
> I would say that "Lt" letter are *both* uppercase and lowercase.

An interesting point of view! Looks strange, but I must think about it.

Some derived tests are becoming incorrect (all letters are lowercase
must no longer be checked by "all isLower" but by "not . any isUpper").

> Or alternatively, if you can (and wish to) add a new API entry:

I think that this phenomenon is it's too rare for having a separate
entry. It will not be used in practice by most people.

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT