Re: Sorting notation

From: Markus Scherer <markus.icu_at_gmail.com>
Date: Fri, 14 Feb 2014 08:26:07 -0800

You need a reset point to say where in the UCA/CLDR universe this rule
chain goes.
http://www.unicode.org/reports/tr35/tr35-collation.html#Orderings

The default collation puts lowercase first. Normally you reset to a
lowercase character and tailor variations to that, otherwise the few
characters you tailor are inconsistent with the rest of Unicode.
Implementations like ICU provide parametric settings (no need for rules) to
specify uppercase first.
http://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options

You should only reorder characters that the default order does not already
have where you need them. For example, reset at each base letter, unless
you want to reorder them relative to each other's default order.
http://www.unicode.org/charts/collation/

See also http://cldr.unicode.org/index/cldr-spec/collation-guidelines
especially about "Minimal Rules".

You can try out collation rules and settings at
http://demo.icu-project.org/icu-bin/locexp?_=root&d_=en&x=col

Best regards,
markus

-- 
Google Internationalization Engineering

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Fri Feb 14 2014 - 10:27:14 CST

This archive was generated by hypermail 2.2.0 : Fri Feb 14 2014 - 10:27:14 CST