Re: Character classification and casing and locales

From: Michael Everson (everson@indigo.ie)
Date: Mon Dec 02 1996 - 10:18:39 EST


Otto wrote:
>What about German, where sharp-s "=DF" (00DF) has to be uppercased to double
>S "SS" (0053+0053)? I.e., one lower-case letter becomes two uppercase
>letters. This may pose a problem, in particular with monospaced fonts where
>users tend to expect that the page layout is preserved under the uppercasing
>operation. It will pose an even greater problem on programmers who normally
>expect that the number of characters, hence the storage requirements of a
>character string, does not change under the uppercasing operation.

Adding a CAPITAL SHARP S, which could be represented by a glyph of "SS"
could solve this perennial problem. I maintain that this is NOT just a joke
solution, because I have seen German periodicals where titles were written
in all caps and a large SHARP S had been designed in the type face to
harmonize with all the other capital letters. In other words, a CAPITAL
SHARP S exists already in the real world anyway. Adding this character
could solve the problem of linguistic analysis for casing operations
(particularly since the rules governing the use of the sharp s are changing
or have been changed.

>In the Latin Extended alphabet, there are more lower-case-only characters,
>such as kra (0138), turned delta (018D), hv (0195), jota (0196), lambda
>with stroke (019B), t with palatal hook (01AB), and so forth. I do not know
>which languages use these letters, but I fear that some languages will
>handle the captalization of these characters different from others.

KRA's capital form is K' (K WITH APOSTROPHE). KRA was used in Greenlandic.
It would be handy for this character to be included in the standard. TURNED
DELTA has no capital, it is an IPA charactger. HV (whose real name is
HWAIR) does have a capital form, and I submitted a request that its capital
be added to the standard. HWAIR is used in Gothic.

>On the other hand, it is not clear whether turned e
>(01DD) can universally be uppercased (to reversed E (018E), or to Schwa
>(018F), as schwa (0259) is an IPA character, hence will neither be used in
>standard orthography nor ever be uppercased). How will these cases be
>handled?

Schwa is used in Azarbaycan in Latin script.

--
Michael Everson, Everson Gunn Teoranta
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire (Ireland)
Gutháin:  +353 1 478-2597, +353 1 283-9396
http://www.indigo.ie/egt
27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT