Update to UAX #29 now available

From: Rick McGowan (rick@unicode.org)
Date: Wed Mar 05 2008 - 10:53:51 CST

  • Next message: Chris Weber (Casaba Security): "mapping characters with visual similarities"

    There is a new draft of the Proposed Update to Unicode Standard Annex
    #29 Unicode Text Segmentation, reflecting changes authorized in the
    last UTC meeting:

    * Sentence Segmentation. Revised the contents of SContinue,
      characters that 'continue' a sentence.
    * Word Segmentation. Added Newline, and rules WB3a and WB3b to break
      words within other newline sequences
    * Grapheme Cluster Segmentation.
    * Added Prepend and rule GB9b to handle Thai and Lao.
    * Major revision of Section 3 Grapheme Cluster Boundaries.
      Includes change of name to extended grapheme cluster, clearer
      distinction from legacy grapheme clusters, and significant
      reordering and enhancement of the text
    * Note that the GraphemeBreakTest file in the UCD now tests the
      extended grapheme clusters, since it is the recommended choice.

    The UAX document is at http://www.unicode.org/reports/tr29/tr29-12.html.

    The data files are in http://www.unicode.org/Public/5.1.0/ucd/auxiliary/.

    The HTML charts are at:


    (The d numbers may be updated over the next month, so if these links
    don't work, go first to the directory.)

    Unicode 5.1.0 is currently in the pre-publication phase and is due
    for release at the end of March 2008. No more substantive changes
    are planned, beyond those already approved by the Unicode Technical
    Committee. However, if you have editorial comments on the text of
    Unicode 5.1.0, including this document, please report via the online
    reporting form (http://www.unicode.org/reporting.html).

    This archive was generated by hypermail 2.1.5 : Wed Mar 05 2008 - 10:57:24 CST