re: ? Wrong definitions for combining character sequence in tr 29

From: verdy_p (
Date: Mon Nov 23 2009 - 23:04:04 CST

  • Next message: Karl Pentzlin: "Re: updated Unicode Utilities"

    The definition is correct, and explained in the table which says "A single base character is **not** a combining
    character sequence."

    The table makes distinctions between the four cases, defined without overlaps, that can make (when joined
    **together** in a union) a single grapheme cluster.

    Your conclusion is wrong, because a single letter 'A' is defined as a "legacy grapheme cluster" and a "legacy
    grapheme cluster ***is*** a grapheme cluster:

    ( CRLF
    | ( Hangul-syllable | !Control )
    | . )

    because it matches "!Control". The same row in the table says that "A single base character is a grapheme cluster".

    And this is also said at the before in section the section 3, just below table 1a:
    "A legacy grapheme cluster is defined as a base (such as A or カ) followed by zero or more continuing characters."

    The "legacy rgapheme cluster" are the simplest and most common forms of grapheme clusters recognized in almost all
    applications. don't interpret "legacy" as meaning "included just for comaptibility", or meaning "still supported but
    not recommended", it just means the most limitative definition used in most legacy applications that don't recognize
    the other forms.

    The same can be said about the extended grapheme clusters that **are** also grapheme clusters.


    > Message du 24/11/09 03:05
    > De : "karl williamson"

    > A : ""
    > Copie à :
    > Objet : ? Wrong definitions for combining character sequence in tr 29
    > It is defined as
    > base? ( Mark | ZWJ | ZWNJ )+
    > That means that a mark is required. So the letter 'A' is not a grapheme
    > cluster.
    > Similarly for the definition for the extended

    This archive was generated by hypermail 2.1.5 : Mon Nov 23 2009 - 23:07:45 CST