Re: Sort Order

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Dec 04 2003 - 18:17:31 EST

  • Next message: Michael Everson: "Re: Supporting the Unicode Project"

    Mustafa Jabbar inquired:

    > Please also inform me about what will be the sorting for Bangla.
    > Thanks and regards

    The Unicode Standard is *not* a sorting standard -- nor is any
    character encoding.

    The reason why it might seem to be, on occasion, is that there is
    a long history of people fiddling with the exact details of a
    character encoding, to attempt to get them in orders so that
    dumb binary comparison algorithms will produce the "correct" results
    for pairs of strings using that particular encoding.

    The general consensus is, however, that it is impossible to
    accomplish meaningful linguistic sorting simply by tinkering
    around with the character encoding tables. See Section 5.16 of
    the standard for a brief discussion of this issue.

    For the related collation standard, see, instead:

    http://www.unicode.org/reports/tr10/

    That is the Unicode Collation Algorithm (UCA). That standard explains
    how to accomplish culturally expected sorting and defines an
    algorithm and default table to use for it.

    That *still* is not the answer for how Bangla will be sorted,
    however. One has to make *use* of the Unicode Collation Algorithm
    and then tailor the table accordingly until you produce the
    results desired.

    So the question which should be asked is: Has anyone produced
    a UCA-based collation for Bangla, and if so, what behavior
    does it have for sorting Bangla data?

    See also the discussion of sorting issues for Indic languages
    in Cathy Wissink's technical note:

    http://www.unicode.org/notes/tn1/

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Dec 04 2003 - 19:12:28 EST