Re: Unicode character cases

From: John Cowan (cowan@locke.ccil.org)
Date: Tue Nov 24 1998 - 15:52:57 EST


Otto Stolz scripsit:

> The question must also be what price you have to pay, in other
> areas, for this optimazation of a particular operation.

Just so.

> Capitalizing rules are, to a large extent, language-specific.

I would rewrite this statement as "Capitalizing rules are,
to a small but non-negligible extent, language-specific.
Therefore, the Unicode case mappings are non-normative,
though generally believed to be useful."

> This means that
> - ISO 8859-3 (and probably other legacy) data cannot be easily
> converted to UCS:

Use of 8859-3 for Turkish is deprecated: 8859-9 (Latin-5) and its
Windows variant CP1254 are the most likely charsets.

> In English, lowercasing will not get the acronyms right; there may
> even be cases, where uppercase vs. lowercase spellings make a
> difference, e. g. a proper name, or an acronym, vs. an ordinary noun.

There would be quite a difference, also, between polish remover
and a Polish remover.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT