Re: Dealing with Georgian capitalization in programming languages

From: Markus Scherer via Unicode <unicode_at_unicode.org>
Date: Tue, 2 Oct 2018 13:12:36 -0700

On Tue, Oct 2, 2018 at 12:50 AM Martin J. Dürst via Unicode <
unicode_at_unicode.org> wrote:

> ... The only
> operation that can cause problems is 'capitalize'.
>
> When I say "cause problems", I mean producing mixed-case output. I
> originally thought that 'capitalize' would be fine. It is fine for
> lowercase input: I stays lowercase because Unicode Data indicates that
> titlecase for lowercase Georgian letters is the letter itself. But it
> will produce the apparently undesirable Mixed Case for ALL UPPERCASE input.
>
> My questions here are:
> - Has this been considered when Georgian Mtavruli was discussed in the
> UTC?
> - How have any other implementers (ICU,...) addressed this, in
> particular the operation that's called 'capitalize' in Ruby?
>

By default, ICU toTitle() functions titlecase at word boundaries (with
adjustment) and lowercase all else.
That is, we implement Unicode chapter 3.13 Default Case Conversions R3
toTitlecase(x), except that we modified the default boundary adjustment.

You can customize the boundaries (e.g., only the start of the string).
We have options for whether and how to adjust the boundaries (e.g., adjust
to the next cased letter) and for copying, not lowercasing, the other
characters.
See C++ and Java class CaseMap and the relevant options.

markus
Received on Tue Oct 02 2018 - 15:13:13 CDT

This archive was generated by hypermail 2.2.0 : Tue Oct 02 2018 - 15:13:13 CDT