Re: Default case algorithms

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 25 Jun 2014 14:37:39 +0200

2014-06-25 10:52 GMT+02:00 Daniel Bünzli <daniel.buenzli_at_erratique.ch>:

> Le mercredi, 25 juin 2014 à 09:10, Richard Wordingham a écrit :
> > Yes - with the caveat that the uppercase mapping of U+0345 is too
> > complicated to defined formally.
> >
> > On the other hand, the Lowercase_Mapping property seems to be inadequate
> > for the default lowercase mapping - Greek final sigma is the
> > complication.
>
> So what you seem to imply is that Unicode’s default full casing are
> defined by applying
>
> 1) The unconditional mappings of SpecialCasing.txt
> 2) The conditional mappings of SpecialCasing.txt (there’s only one, the
> final
> sigma case).
>

There's also the Turkic i or j (problems related to letters that are
usually soft-dotted in the Latin script except in Turkic languages, whose
case mapping is context-dependant with the right side to see if we need to
add a combining dot above).
We could insist to have Turkish texts using an explicit combining dot above
after the dotless i (or j), biut most Turkish texts just use the plain
ASCII letter, by reinterpreting its soft-dot as a hard dot, that needs to
be added when converting to uppercase, and removed when conertng to
lowercase. Note also that the dotless i or dotless j are not part of any
case pair.
For Turkish readers, a dotless i followed by an explicit combining dot
above (hard dot) is not recommanded, and they use the standard ASCII letter
directly, as if it was a precombined but decomposable letter. In Turkish
texts, a dotless i without diacritic pairs with a capital ASCII letter I
directly (this mapping to uppercase is *not* contextual,but the reverse
conversion to lowercase *is* contextual).

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Wed Jun 25 2014 - 07:39:35 CDT

This archive was generated by hypermail 2.2.0 : Wed Jun 25 2014 - 07:39:36 CDT