[Unicode Announcement] Unicode Collation Algorithm Version 5.2 Released

From: announcements@unicode.org
Date: Wed Oct 21 2009 - 15:12:33 CDT

  • Next message: jefsey: "Re: browser behavioral differences for IDNA"

    Version 5.2 of the Unicode Collation Algorithm has been released.
    See http://www.unicode.org/reports/tr10/.
    This version resynchronizes the Unicode Collation Algorithm with all
    of the updates for the Unicode Standard, Version 5.2. Please note
    the following changes and issues for implementations:

        * The text of UTS #10 has been updated. Among other changes, the
          revised text for UTS #10 makes it clear that the BASE for
          implicit generation of weights for Han characters does not
          include unassigned code points.
        * There are small changes in Gujarati, Telugu, Malayalam
          (including weighting for chillus), Tamil, and Sinhala. While
          these changes move in the direction of expected behavior, good
          results will only come from tailoring for particular languages,
          such as with CLDR.
        * There have been significant changes to the ordering of many
          combining marks. Many combining marks that are not in customary
          use in modern languages now have the same secondary weight, and
          will only be distinguished on a fourth level, by code point
          ordering. This can be seen by looking at the Unicode Collation
          Charts (http://unicode.org/charts/collation/). In 5.2, many
          characters now have a white background, indicating that they
          sort exactly the same as the previous character, unless a 4th
          (codepoint) level is used.
        * Implementations of UCA should take note that the increased
          number of characters may cause overflows if the implementing
          code makes certain assumptions or optimizations. This can result
          either from the new character additions (which increase the
          number of distinct weights in the table) or because of changes
          in the way the weights, particularly for secondary weight
          values, are assigned in the table. The latter change may result
          in unexpected numbers of characters having the same weight.

    All of the Unicode Consortium lists are strictly opt-in lists for members
    or interested users of our standards. We make every effort to remove
    users who do not wish to receive e-mail from us. To see why you are getting
    this mail and how to remove yourself from our lists if you want, please
    see http://www.unicode.org/consortium/distlist.html#announcements

    This archive was generated by hypermail 2.1.5 : Wed Oct 21 2009 - 15:20:17 CDT