L2/05-342 Source: John.Cowan Date: 2005-10-25 Subject: Lithuanian case folding Please treat this as a submission to the meeting. Sorry for the delay in sending it. Thanks. Lithuanian extends the basic Latin alphabet in the following ways: "e" can have a dot above; "a", "e", "i", and "u", can have ogonek; "c", "s", and "v" can have caron, and "u" can have macron. In addition, any vowel can have grave, acute, or tilde to represent stress or vowel length or both; these three marks are usually written only in poetry, dictionaries, or to resolve crucial ambiguities. It is a special feature of Lithuanian typography that when a grave, acute, or tilde is placed over "i", the "i" retains its dot. This is handled in the Unicode Standard by placing an explicit combining dot-above between the "i" and the accent. Although the special case of dotted plus accented "i" in Lithuanian is correctly handled in Unicode case *mapping*, it is not correctly handled in Unicode case *folding*. In order to case-fold a Lithuanian word containing I with a diacritic above, it is necessary to fold it to i + COMBINING DOT ABOVE + diacritic. This is not the case in Turkish/Azeri nor in general text. The only proposal to resolve this has been to fold COMBINING DOT ABOVE itself to zero. This could not be done in Turkish/Azeri without destroying case folding (though dotted vs. dotless i minimal pairs are rare in Turkish, they do exist), and should not be done for non-Lithuanian uses of dot above either. The only real alternatives are to ignore the problem and provide suboptimal Lithuanian support, or to bite the bullet and add new Lithuanian folding rules that fold "I" and "I with ogonek" into i + dot above and i with ogonek and dot above respectively. I recommend the latter. (This will break ICU, which currently has a boolean flag for localized casing, Turkic or non-Turkic. This is a classic example of why Boolean flag arguments are a Bad Thing: many binary oppositions turn out not to be so binary after all.) -- John Cowan jcowan@reutershealth.com www.ccil.org/~cowan www.reutershealth.com