Re: Swapcase for Titlecase characters

From: Marcel Schneider <>
Date: Sat, 19 Mar 2016 17:40:43 +0100 (CET)

On Sat Mar 19, 2016 12:54:51, Martin J. Dürst wrote:

> On 2016/03/19 04:33, Marcel Schneider wrote:
> > On Fri, Mar 18, 2016, 08:43:56, Martin J. Dürst wrote:
> >> b) Convert to upper (or lower), which may simplify implementation.
> >> For example, 'Džinsi' (jeans) would become 'DžINSI' with a), 'DŽINSI' (or
> >> 'džinsi') with b), and 'dŽINSI' with c). For another example, 'ᾨδή' would
> >> become 'ᾨΔΉ' with a), 'ὨΙΔΉ' (or 'ᾠΔΉ') with b), and 'ὠΙΔΉ' with c).
> > Looking at your examples, I would add a case that typically occurs for swapcase to be applied:
> > ‘ᾠΔΉ’ (cited [erroneously] as a result of option b) that is to be converted to ‘ᾨδή’, and ‘džINSI’, that is to become ‘Džinsi’.
> First, what do you mean with "erroneously"?

The intent of that bracketed word was just to give account of the fact that when ‘ᾨδή’ is converted to lower case as assumed in option “b-lower”, it becomes ‘ᾠδή’, while ‘ᾠΔΉ’ is a typical candidate for swapcase, thus I could reutilize it “as is” to illustrate the fourth case.

> Second, did I get this right that your additional case (let's call it
> d)) would cycle through the three options where available:
> lower -> title -> upper -> lower.

I’m afraid that swapcase as I saw it is not a roundtrip method, therefore I got some awkward moments today when I thought about how to implement it. As far as I could see, there are two pairs:

I: lowercase → titlecase (needed to correct the initials where the user pressed the shift modifier)
II: uppercase → lowercase (needed to correct the body of the words input while caps lock was on)

That typically matches what happens when caps lock is accidentally on and the user writes normally―on a keyboard that includes digraphs and uses the SGCaps feature for them, like this:

Modifier; None; Shift
CapsLock off; Lower; Title
CapsLock on; Upper; Lower

Correcting keyboard input done with the wrong caps lock state is the only situation I can see where swapcase is needed and thus is likely to be used. This is why the swapcase method is implemented in word processors, as a part of an optional autocorrect feature that neutralizes the effet of starting a sentence normally while caps lock is on: After completing the input of an uppercase word with an initial lowercase letter, the word is automatically swapcased and caps lock is turned off.

However now that I tested it with the digraph of the examples (input through the composer of the keyboard layout), it doesnʼt work at all in one word processor, while in another one it works but uppercases the initial lowercase digraph instead of titlecasing it. [That may be considered effects of “streamlined” implementations that drop the less frequent cases.]

I donʼt believe that it would be useful to make swapcase a roundtrip method, and anyway it would be weird because of the letters with three case forms. The case conversion cycle you draw above usually applies to words (and doesnʼt work correctly in neither of the two tested word processors when an initial DZ digraph is present), while most letters have identical values for Titlecase_Mapping and Uppercase_Mapping, and usually there is no means to flag them with “Titlecase_State”. This might be one more reason why current implementations of swapcase donʼt match the expected behavior for digraphs.

> > As about decomposing digraphs and ypogegrammeni to apply swapcase: That probably would be doing no good,
> > as itʼs unnecessary and users wonʼt expect it.
> Why do you say "users won't expect it"? For those users not aware of the
> encoding internals, I'd indeed guess that's what users would expect, at
> least in the Croatian case.

That depends on what is the expected result. If the swapcase method is to correct inverted casing, users wouldnʼt like to see the digraphs decomposed, the less as in the considered languages, the DZ digraph is a part of the alphabet between ‘D’ and ‘Đ’, so that users are really aware.

> For Greek, it may be different; it depends
> on the extent to which the iota is seen as a letter vs. seen as a mark.

Here again the user inputs a precomposed letter, with iota subscript because he just wants a capitalized word, not an uppercase one. And here again the autocorrect doesnʼt work in one word processor, while in the other one it applies uppercasing with uppercase iota adscript―while the rest of the word is lowercase―instead of capitalization, with lowercase iota adscript or iota subcript, that depends on conventions and preferences.

Letʼs take that as a proof how hard it is to implement swapcase with digraph support.

I canʼt better conclude this reply than with Asmus Freytagʼs words on Fri, 1st Jan 2016 12:09:13 -0800: [1]

> Unicode aims to be expressive enough to model all plain text. That means, it inherits the non-reducible complexity of text. Even the insight that the complexity is non-reducible would be a big step forward.



[1] Re: Unicode in the Curriculum? from Asmus Freytag (t) on 2016-01-01.
Received on Sat Mar 19 2016 - 11:44:24 CDT

This archive was generated by hypermail 2.2.0 : Sat Mar 19 2016 - 11:44:24 CDT