[Unicode Announcement] Unicode Collation Algorithm Version 5.2 Released

From: announcements@unicode.org
Date: Wed Oct 21 2009 - 15:12:33 CDT

Next message: jefsey: "Re: browser behavioral differences for IDNA"

Previous message: Mark Davis ☕: "browser behavioral differences for IDNA"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Version 5.2 of the Unicode Collation Algorithm has been released.
See http://www.unicode.org/reports/tr10/.
This version resynchronizes the Unicode Collation Algorithm with all
of the updates for the Unicode Standard, Version 5.2. Please note
the following changes and issues for implementations:

    * The text of UTS #10 has been updated. Among other changes, the
      revised text for UTS #10 makes it clear that the BASE for
      implicit generation of weights for Han characters does not
      include unassigned code points.
    * There are small changes in Gujarati, Telugu, Malayalam
      (including weighting for chillus), Tamil, and Sinhala. While
      these changes move in the direction of expected behavior, good
      results will only come from tailoring for particular languages,
      such as with CLDR.
    * There have been significant changes to the ordering of many
      combining marks. Many combining marks that are not in customary
      use in modern languages now have the same secondary weight, and
      will only be distinguished on a fourth level, by code point
      ordering. This can be seen by looking at the Unicode Collation
      Charts (http://unicode.org/charts/collation/). In 5.2, many
      characters now have a white background, indicating that they
      sort exactly the same as the previous character, unless a 4th
      (codepoint) level is used.
    * Implementations of UCA should take note that the increased
      number of characters may cause overflows if the implementing
      code makes certain assumptions or optimizations. This can result
      either from the new character additions (which increase the
      number of distinct weights in the table) or because of changes
      in the way the weights, particularly for secondary weight
      values, are assigned in the table. The latter change may result
      in unexpected numbers of characters having the same weight.

----
All of the Unicode Consortium lists are strictly opt-in lists for members
or interested users of our standards. We make every effort to remove
users who do not wish to receive e-mail from us. To see why you are getting
this mail and how to remove yourself from our lists if you want, please
see http://www.unicode.org/consortium/distlist.html#announcements

Next message: jefsey: "Re: browser behavioral differences for IDNA"
Previous message: Mark Davis ☕: "browser behavioral differences for IDNA"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Oct 21 2009 - 15:20:17 CDT