Re: LC_CTYPE locale category and character sets.

From: Alain LaBont\i\ (
Date: Mon Aug 03 1998 - 16:07:49 EDT

A 22:45 98-07-16 -0700, Geoffrey Waigh a écrit :
>Nelson H. F. Beebe wrote:
>> Keld J{\o}rn Simonsen writes:
>> >> Yes, some French maintain that the uppercase version of lowercase
>> >> accented letters do not have accents.
>> That is traditional in the French in France, but not in the French in
>> Canada, where accents are preserved in uppercase letters.
>> At a conference I attended in Paris a few years ago, a French
>> typographer reported that the dropping of accents in uppersased words
>> often led to confusion, citing <<Congr{\`e}s de D{\'e}putes>> ->
>> <<CONGRES DE DEPUTES>>, which is pronounced entirely differently.

[Geoffrey] :
>So as I thought, the pronounciation does not change when the text
>is uppercased, they probably would want string searches to not
>consider the unaccented and "virtual" accented words to match and
>would probably also want intelligent software to restore accents when
>lowercasing the text.

[Alain] :
Search is anothe story and has peculiarities in upper case and in lower
case in French.

Sometimes one wants to do precise searches, for example to distinguish "dû"
(due) from "du" (definite article meaning "of the", in the masculin form;
it is the 22nd most used word in French, therefore not significant very
much in searches).

Sometimes one wants to do fuzzy search, as many roots of words lead to
variations in accentuation: hence if I search with the wild card "cle*" I
want to retrieve both "clé" and "clef", two spellings (the firts one more
modern) of the same word. If I search "revele*" I want to retrieve
"rèvèle", "révélé", "révéler" and so on.

This, compound with upper and lower case requirements too (for precise and
fuzzy search) and with special characters taken into consideration or not.

Alain LaBonté

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT