Re: Unicode character cases

From: John Cowan (
Date: Tue Nov 24 1998 - 15:52:57 EST

Otto Stolz scripsit:

> The question must also be what price you have to pay, in other
> areas, for this optimazation of a particular operation.

Just so.

> Capitalizing rules are, to a large extent, language-specific.

I would rewrite this statement as "Capitalizing rules are,
to a small but non-negligible extent, language-specific.
Therefore, the Unicode case mappings are non-normative,
though generally believed to be useful."

> This means that
> - ISO 8859-3 (and probably other legacy) data cannot be easily
> converted to UCS:

Use of 8859-3 for Turkish is deprecated: 8859-9 (Latin-5) and its
Windows variant CP1254 are the most likely charsets.

> In English, lowercasing will not get the acronyms right; there may
> even be cases, where uppercase vs. lowercase spellings make a
> difference, e. g. a proper name, or an acronym, vs. an ordinary noun.

There would be quite a difference, also, between polish remover
and a Polish remover.

