On Nov 26, 16:16, Rick McGowan wrote: > The Unicode FTP site, and the standard, provides a default upper/lower case > table. I think that this particular operation is typically the same > everywhere, with the exception of the dotted upper-case "I" in Turkish. What about German, where sharp-s "ß" (00DF) has to be uppercased to double S "SS" (0053+0053)? I.e., one lower-case letter becomes two uppercase letters. This may pose a problem, in particular with monospaced fonts where users tend to expect that the page layout is preserved under the uppercasing operation. It will pose an even greater problem on programmers who normally expect that the number of characters, hence the storage requirements of a character string, does not change under the uppercasing operation. In German, upper case to lower case conversion must convert some instances of double-S to double-s, but other instances to sharp-s. This can only be decided based on a thorough linguistic analysis; there is no simple rule based on a character-wise analysis. Likewise, the uppercase S will have to be lowercased either as s or as long s (017F), if the latter is used, at all (the long s is extinct, in standard orthography; it was officially used until 1942). Hence, a correct implementation of lower-casing would have to take the language in account. (Of course, it would be wise to keep all strings in mixed case for later reference, and only uppercase. or lowercase, copies for temporary use only.) In the Latin Extended alphabet, there are more lower-case-only characters, such as kra (0138), turned delta (018D), hv (0195), jota (0196), lambda with stroke (019B), t with palatal hook (01AB), and so forth. I do not know which languages use these letters, but I fear that some languages will handle the captalization of these characters different from others. E.g., I can imagine that some languages will capitalize small l with bar ((019A) to L with stroke (0141) so that the latter cannot universally be lowercased to l with stroke (0142). On the other hand, it is not clear whether turned e (01DD) can universally be uppercased (to reversed E (018E), or to Schwa (018F), as schwa (0259) is an IPA character, hence will neither be used in standard orthography nor ever be uppercased). How will these cases be handled? Does anybody know any other anomaly in the uppercasing, or lowercasing, of any language? Best wishes, Otto Stolz