RE: What is this "case folding"?

From: Marco.Cimarosti@icl.com
Date: Tue Jul 11 2000 - 06:49:48 EDT


Robert Lozyniak wrote:
> If it is what I think it is, I don't want it in English.
> How could it tell "aids" from "AIDS", for instance?
> Or "joy" from "Joy"(name)?

(C'mon, 11BB, you were supposed to know this one ;-)

Case folding (or case conversion) is the process of changing letters from
one case to the other (or to *an* other, in Unicode). It is present as a
menu command in most word processors or text editors, and as built-in
functions in most programming languages. If you don't want it, the best
policy is not to use it.

Converting uppercase letters to lowercase is normally not very useful. But
the opposite (called "capitalization") is often used to achieve EMPHASIS,
specially in headings.

(I personally don't like capitalization too; partly because the uniform
height of letters makes reading more difficult; partly because of the
problem you mentioned: the difference between upper and lower case is
sometimes meaningful, and capitalization loses it.)

A temporary case folding "under the shelf" is also used to achieve case
insensitive comparisons in a variety of processes (search and substitute,
sorting, dictionary look up, etc.). In this case, lowercasing is normally
preferred to uppercasing, because it is less destructive (i.e., there are
more lc letters than uc letters, so it is unlikely that different uc letters
map to the same lc letters, whereas it is common that different lc letters
map to the same uc letter -- e.g. "ß" and "ss" both are capitalized as
"SS").

To achieve case insensitive (or however "loose") comparison, an alternative
to hidden case folding is using collation tables, that assign one or more
levels of "weight" keys to each character. One good example of this is
UTR#10 (http://www.unicode.org/unicode/reports/tr10/).

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT