Re: LC_CTYPE locale category and character sets.

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jul 16 1998 - 15:54:46 EDT


Michael Everson asked:

> >Mark Davis pointed at the Unicode Standard for the full answer.
> >
> >The short answer is that the Unicode Character Database (and you
> >should be using Version 2.1.2 now) gives all the default one-to-one
> >case mappings. Some case mappings (e.g., for French and for Turkish)
> >differ from the defaults.
>
> French?

This refers to what may be the declining practice of uppercasing
accented French vowels to accented capital letters in some French
locales and to unaccented capital letters in other French locales.

>
> How do you do reversible conversions from lowercase to uppercase and back,
> though? Or is that "outside the scope" of coding in your view?

On a character-by-character basis, case conversion is reversible for
the one-to-one case mappings, but not necessarily for the one-to-many
mappings.

On a string basis, case conversion is data destructive, since it
levels the difference between lowercase and uppercase which may have
coexisted in the original data. Because of this, case conversion is
not generally reversible.

Applications may choose to engage in workarounds by keeping the source
data for reversing a change. Word 97 illustrates this by allowing
a selection of text to be set to ALL CAPS; you can then turn off
that style and get the text to revert to its original form.

But yes, generally, reversibility of string conversions is
outside the scope of character encoding.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT