[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #2898(closed: fixed)

Opened 9 years ago

Last modified 5 months ago

Search collators that modify Latin-script behavior

Reported by: pedberg Owned by: pedberg
Component: other Data Locale:
Phase: Review: emmons
Weeks: Data Xpath:
Xref:

Description (last modified by pedberg) (diff)

Now that we have a root-level search collator (per cldrbug 2182 :) which has some modifications from UCA for Thai/Lao and Arabic (and soon Korean), various languages need search collators that override the root search collator's handling of Latin script (which is DUCET).

For example: The search collator for Turkish should treat as primary-different the pairs c/ç, i/ı, o/ö, and u/ü (as does the normal Turkish collator).

Attachments

Change History

comment:1 Changed 9 years ago by mark

  • Owner changed from somebody to pedberg
  • Priority changed from assess to medium
  • Status changed from new to assigned
  • Milestone set to 1.9m2

comment:2 Changed 9 years ago by pedberg

  • Milestone changed from 1.9m2 to 1.9RC

comment:3 Changed 8 years ago by pedberg

Based on feedback from Apple's languages group:

Also need search collators that make primary distinctions as in normal collator for the following (i.e. ones where we should not use DUCET primary equivalences for Latin script, as the root search collator does):
ca, da, de (use phonebook style), es, fi, hr, nb, sv , tr

Don't need special search collators for the following, since accented letters that the language normally treats as primary distinct (when DUCET does not) should not be treated as primary distinct for search (partly because in the past these may have been hard to type, so text may not have these marked up correctly; also people are accustomed to not typing the accents when officially they should be used): hu, pl, uk

Don't need special search collators for the following since they are handled by the root search collator: ar, th, he

Don't have data for other languages yet.

comment:4 Changed 8 years ago by pedberg

  • Description modified (diff)
  • Summary changed from Turkish needs a search collator to override root's to Search collators that modify Latin-script behavior

comment:5 Changed 8 years ago by kent.karlsson14@…

ca, da, de (use phonebook style), es, fi, hr, nb, sv , tr

Add at least nn, fo, is, se, kl, az to the list.

I'm not sure why the fallback for "search" collator isn't the "standard" collator. Falling back to root "search" collator seems to be a bad idea.

comment:6 Changed 8 years ago by pedberg

  • Status changed from assigned to accepted
  • Review set to emmons

Kent, thanks for the suggestion; added nn, fo, is, se, kl assuming the expectation for special-character handling is like that for da. fi. nb. sv; added az assuming the expectation for special-character handling is like that for tr.

As for why the fallback is to root search collator first, that was discussed in the design doc http://cldr.unicode.org/development/design-proposals/search-collators. Basically, the idea is that the number of locales that have primary-level DUCET tailorings that should be used in search is a relatively small percentage of the total; in most cases we want the root search collator rather than anything from the locale.

Anyway, this bug is about fixing those minority cases where locale-specific search does need to do something different. Since we don't have collation import yet, for these cases we just copy the root search collator and then add in (usually) the standard rules for the locale. For "de" we use instead the phonebook rules (preferred for search); for "ca" and "nb" we use slight modifications of the standard rules. This input for this was from translators and language experts in Apple's Language Group.

comment:7 Changed 8 years ago by emmons

  • Status changed from accepted to closed
  • Resolution set to fixed

comment:8 Changed 5 months ago by mark

  • Component changed from main to other
View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.