Unicode collation algorithm - interpretation

From: J M Sykes (mike.sykes@acm.org)
Date: Thu Feb 08 2001 - 08:14:31 EST

In the proposal for better accommodating UCS in SQL, we assumed that a
comparison performed according to UTR#10, "Unicode Technical Standard #10
Unicode Collation Algorithm", would require four parameters, viz.

    Two strings to be compared

    A collation element table

    A maximum level as mentioned in UTR#10, section 4.3
    "Form a sort key for each string", which specifies Step 3.

SQL already uses the term 'collation', each of which is identified by a
<collation name>, but does not accommodate the notion that the same
collation element table can be applied at different levels.

In our proposal, we have assumed that <collation name> identifies a
collation element table, and have extended SQL syntax to allow the user to
specify the fourth parameter (or leave it to be defaulted).

It has been suggested that SQL <collation name> should instead identify both
collation element table and maximum level.

Perhaps the second approach might be useful in the case where, for reasons
of performance, sort keys are constructed in advance of being needed, for
example to be stored as 'shadow columns' in SQL base tables, or in indexes.

On the other hand, the first approach seems to be more user-friendly in the
case where at least two collation element tables are available, provided
their levels correspond (i.e. provided level 2 means 'case-blind' in both

Would anyone care to comment?



J M Sykes Email: Mike.Sykes@acm.org
97 Oakdale Drive
Heald Green
Cheshire SK8 3SN
UK Tel: (44) 161 437 5413


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT