Re: UCA and Russian letter Ё

From: Markus Scherer <markus.icu_at_gmail.com>
Date: Fri, 21 Dec 2012 10:53:07 -0800

Resending my earlier reply. Apparently, by default, Gmail sends subject
lines in KOI8-R if they contain Cyrillic, and unicode.org rejects those as
likely spam. I just changed my Gmail settings to "Use Unicode (UTF-8)
encoding for outgoing messages" and hope this goes through. (*Please change
the subject line* if you want to discuss *this* issue.)

My earlier reply was:

Theoretically, it is possible to select collation elements based on the
proximity of word boundaries or other criteria. However, I don't know if
there is an implementation that has that built in. ICU (one of the commonly
used implementations of UCA+CLDR) does not.

It sounds like the secondary difference is ok for sorting, but you are
looking to customize an alphabetic index such that there is a separate
"bucket" for words beginning with Ё. I think the best would be to do that
with some custom code that looks for Ё as the first character, in addition
to the regular bucketing and sorting.

Best regards,
markus

-- 
Google Internationalization Engineering
Received on Fri Dec 21 2012 - 12:56:37 CST

This archive was generated by hypermail 2.2.0 : Fri Dec 21 2012 - 12:56:38 CST