In response to Maurice's query, it is my assessment that neither
the Unicode Collation Algorithm nor its technical equivalent, 14651,
are up to the mark for syllabic-based ordering. Details may differ,
from case to case, but effectively, the issue is as follows.
In syllabic-based ordering, you need first to be able to
identify syllabic boundaries. Then you can weight all the syllables
via a mechanism like the UCA, to get the appropriate multi-level
weighting for primary letters, secondary accents, and so on. Then,
to get the final ordering, you do what is effectively a multi-column
sort, first on the first syllables, then on the second syllables, and
Conceptually, this is like putting all the strings in a spreadsheet,
separating them out so you get one string per row, and one syllable
per column, starting from the first column. Then put a formula in
each cell that computes a multilevel weight for that syllable using
the UCA. Then sort the computed values of all the cells with a multicolumn
So this is really a matter of higher level processing that depends on
A. A syllabic parser
B. The UCA algorithm for weighting the pieces
C. A multicolumn sorting mechanism
While the multicolumn sorting is a natural for databases and the
SQL standard, and while the UCA algorithm can probably be meaningfully
tied to a UNICHAR datatype support in the SQL standard, I think the
syllabic parsing aspect is out-of-bounds. That really is a language-specific
issue that needs to be dealt with on a language-by-language and
writing system by writing system basis, and is not a problem that
ought to be tackled in something like the SQL standard (nor UCA,
for that matter).
> I'm afraid you have the wrong bloke here, Maurice. The technicality of my
> query may have ffoled you into thinking I'm a UTR#10 expert - far from it!
> All I can do is cc your query to the Unicode list - and wish you luck,
> naturally :-)
> ----- Original Message -----
> From: "Maurice Bauhahn" <firstname.lastname@example.org>
> To: <Mike.Sykes@acm.org>
> Sent: Thursday, February 08, 2001 2:27 PM
> Subject: Unicode collation algorithm - interpretation
> > Hello Mike, from the U.K.!
> > What I have seen of the Unicode collation algorithm makes me wonder
> > it will handle syllabic-based ordering! I specialise in
> > which has (at least) six levels of priority within each syllable.
> > SQL collation will be open to such difficult environments.
> > http://www.bauhahnm.clara.net/KhmerSortingUnicodebeta.pdf
> > Cheers,
> > Maurice Bauhahn
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT