At 13:07 97-02-03 -0800, Mark Davis wrote:
>I realize as you do that filtering unknown characters is a problem.
And filtering may be due to all kinds of things, including hardwired range
>However, I think you are missing my point. In regular expressions, you
>are producing a pattern that will match certain characters. Rather than
>list them all, there is a shorthand that people use, which is to list
>ranges of code points. My point is:
>For this *particular* application, usually when people list "a-z", they
>really mean "Latin Letters", or often, just "Letters".
Then if they mean so they might miss the thorns (þÞ), whence my point. If
they mean letters then they should not use a hard-wired "range" function.
>The latter is
>actually usually BETTER for the problems that you list than restricting
>it to a particular range for a particular language.
We are on the same wave length on this.
>Even better would be to look at common practice and separate out *more*
>higher level divisions, such as "Vowel" which often arise in regular
Even vowels may vary from language to language: hence y is *always* a vowel
in French, w is *never* one... that has to be localized too... (;
>However, there are times where software does only recognize certain
>letters, and has to be able to do so. A C compiler, unlike Java, doesn't
>allow accented letters in identifiers. If you have to mimic that
>behavior, then you want to use a precise description of the characters.
Even this practice of programming languages is questionable and was made for
English-speaking programmers only... but that is another debate... I don't
want to enter into antediluvian debates... Let's just conclude that new
programming languages should not reproduce those bad-taste and parochial
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT