Re: UCA and Russian letter Ё

From: Leo Broukhis <leob_at_mailcom.com>
Date: Fri, 21 Dec 2012 13:00:51 -0800

On Fri, Dec 21, 2012 at 11:35 AM, Jukka K. Korpela <jkorpela_at_cs.tut.fi> wrote:
> 2012-12-21 21:05, Leif Halvard Silli wrote:
>
>> My Moscow Russian-Norwegian from 1987 and my Pocket Oxford Russian
>> Dictionary from 2003 agree that both list words on Ё and Е under the
>> same category – namely, under the letter Е.
>
> This appears to be the case in any serious dictionary.

You're right. In an influential orthographic dictionary the difference
is secondary,
e.g. ёлка is between елисейский дворец and ёлки-палки:
http://lopatina-slovar.com/description/elka/34736
(The site database has been built by scanning a printed dictionary)

However, the preferences could change, as electronic dictionaries seem
to demonstrate.

> It is of course possible that some people would prefer treating “ё” as a
> primarily different letter. But it’s rather illogical to require that it be
> treated that way at the start of a word only. I don’t think collation rules
> need to accommodate such preferences.

Granted, not yet, but by itself the argument is invalid. Unicode
collation rules are descriptive;
if, for example, a language happens to sort accents backwards, this
rule has to be - and is - accommodated despite its apparent
illogicality;
along the same lines, if a language happens to make a distinction
discussed in this thread, it has to be accommodated just as well.

Also, "In several languages the rules have changed over time, and so
*older dictionaries may use a different order than modern ones* [emph.
mine - LB]. Furthermore, collation may depend on use. For example,
German dictionaries and telephone directories use different
approaches."
[http://en.wikipedia.org/wiki/Collation]

The distinction in two collation methods in German (secondary vs
expanded umlauts) is prominent enough to be mentioned in UCA. Luckily
for Germans, both methods are covered by the algorithm thanks to
requirements of other languages.

My question is as follows: does UCA have to be modified (e.g. by
adding another bit flag "word-initial primary" next to the existing
"backward secondary") to support the feature if it were to be
implemented, or is there a way to achieve the "new Russian online
collation" within the existing UCA without modifying the strings to
be sorted before the application of the algorithm?

Leo
Received on Fri Dec 21 2012 - 15:02:26 CST

This archive was generated by hypermail 2.2.0 : Fri Dec 21 2012 - 15:02:27 CST