|Re:||UCA Revised Latin?|
We should consider whether or not to do the following changes to the next version of the UCA.
[For the meeting, please also print http://www.unicode.org/charts/collation/chart_Latin.html]
1. Make alternate forms of letters (like the following) be secondary differences from the 'base' letter.
Outliers: the following appear unrelated to the 'base' letter that they are after (in UCA order), so should be left where they are.
2. Make "æ" be a secondary difference from "ae".
For reference, here is an email related to the topic.
> ----- Original Message -----
> From: Åke Persson
> To: Mark Davis
> Sent: Wed, 2003 Dec 31 06:36
> Subject: ae << æ etc.
> I have browsed the latest ICU collations. Here are a few comments.
> The inclusion of ae << æ in several languages resembles my experience when I
> implemented the UCA in Mimer SQL. The next thing that came up was letters with
> stroke. For example, the Polish letter L-stroke, properly used in Polish names,
> did not match a Swedish or English search for names containing L. L-stoke is
> expected to be L with a stroke "accent", except for Polish (and Sorbian).
> <<Lodz.jpg>> is a snapshot from a Swedish encyclopædia (note also "oe"). To make
> a long story short, it all ended up in the European Ordering Rules (EOR)
> concept, where the base letters in the latin alphabet are only A-Z. The first
> step was to create an EOR-tailoring as the base. Languages, with additional
> letters in their alphabet, was tailored on top of the EOR tailoring. The next
> step was improvement of space and performance, by making EOR the default, and to
> create a tailoring for the default UCA instead (at least needed for the
> conformance test).
> Here's an overview of the tailorings:
> Please, take a closer look at:
> Catalan, Croatian, Faroese, Icelandic, Latvian, Lithuanian, Romanian, and Slovak
> compared to the corresponding ICU collations.
> My sources are documented here:
> The E-ogonek (old Sami and Icelandic Ä) as a variant of Ä in Faroese, Finnish,
> Greenlandic, Norwegian, and Swedish looks a bit goofy. I would rather expect a
> search match for E in Polish and Lithuanian names containing E-ogonek. I think
> it's better to have a specific locale for Sami.
> [before 1] is used extensivly in the ICU collations. It's easier to read the
> collation definitions, if [before 1] is used only when necessary.
> Happy New Year!
> Åke Persson