Re: Dealing with Georgian capitalization in programming languages

From: Martin J. Dürst via Unicode <>
Date: Tue, 9 Oct 2018 16:47:14 +0900

Hello Ken, others,

On 2018/10/03 06:43, Ken Whistler wrote:

> But it seems to me that the problem you are citing can be avoided if you
> simply rethink what your "capitalize" means. It really should be
> conceived of as first lowercasing the *entire* string, and then
> titlecasing the *eligible* letters -- i.e., usually the first letter.
> (Note that this allows for the concept that titlecasing might then be
> localized on a per-writing-system basis -- the issue would devolve to
> determining what the rules are for "eligible" letters.) But the simple
> default would just be to titlecase the initial letter of each "word"
> segment of a string.
> Note that conceived this way, for the Georgian mappings, where the
> titlecase mapping for Mkhedruli is simply the letter itself, this
> approach ends up with:
> capitalize(mkhedrulistring) --> mkhedrulistring
> capitalize(MTAVRULISTRING) ==> titlecase(lowercase(MTAVRULISTRING)) -->
> mkhedrulistring
> Thus avoiding any mixed case.

I have been thinking through this. It seems quite appealing.

But I'm concerned there may be some edge cases. I have been able to come
up with two so far:

- Applying this to a string starting with upper-case SZ (U+1E9E).
   This may change SZ → ß → Ss.
- Using the 'capitalize' method to (try to) get the titlecase
   property of a MTAVRULI character. (There's no other way
   currently in Ruby to get the titlecase property.)

There may be others. If you have some ideas, I'd appreciate to know
about them.

This lets me wonder why the UTC didn't simply declare the titlecase
property of MTAVRULI to be mkhedruli. Was this considered or not? The
way things are currently set up, there seems to be no benefit of
MTAVRULI being its own titlecase, because in actual use, that requires
additional processing.

Regards, Martin.
Received on Tue Oct 09 2018 - 02:47:52 CDT

This archive was generated by hypermail 2.2.0 : Tue Oct 09 2018 - 02:47:52 CDT