Re: ? Wrong definitions for combining character sequence in tr 29

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Nov 24 2009 - 19:05:26 CST

  • Next message: karl williamson: "Re: ? Wrong definitions for combining character sequence in tr 29"

    Karl Williamson wrote:

    > Thanks for your reply. I'm afraid I'm still confused.
    >
    > The sentence before Table 1b is the first mention in this document of
    > combining character sequences; it would be nice it it discussed what
    > they were, and why even mention them at all? In the past, I just
    > presumed they were an earlier concept that was superseded by grapheme
    > clusters.

    It is an earlier concept. But it is not superseded by grapheme
    clusters.

    >
    > They are discussed some in 3.6 of the actual standard, and here there
    > seem to me to be contradictions:
    >
    > "• A grapheme cluster is similar, but not identical to a combining
    > character sequence. A combining character sequence starts with a base
    > character and extends across any subsequent sequence of combining marks,
    > nonspacing or spacing. A combining character sequence is most directly
    > relevant to processing issues related to normalization, comparison, and
    > searching.
    > • A grapheme cluster starts with a grapheme base and extends across any
    > subsequent sequence of nonspacing marks. A grapheme cluster is most
    > directly relevant to text rendering and such processes as cursor
    > placement and text selection in editing."
    >
    > This seems to me to imply that a base character is always the first item
    > of a combining character sequence,

    Usually, yes, but not definitionally. Read D56 and D57 carefully.
    A *defective* combining character sequence doesn't start with
    a base character, but is a combining character sequence nonetheless.

    > and the word 'any' seems to me to
    > imply 0 or more marks following it.

    For a grapheme cluster, yes. A single base character *is*
    a grapheme cluster. It is *not* a combining character sequence.

    > And this doesn't help me understand why there is the concept of a
    > combining character sequence and why that is more relevant than a
    > grapheme cluster to normalization, comparison, and searching.

    Normalization is not defined in terms of grapheme clusters.
    Grapheme clusters are about segmentation issues in text (which
    is why they are defined in UAX #29, the UAX about text segmentation).

    Normalization, on the other hand, is *definitionally* concerned
    with combining character sequences, because at the core
    of normalization is the canonical ordering of sequences of
    combining marks. See the Canonical Ordering Algorithm subsection
    of Section 3.11 Normalization Forms in the latest posted
    version of the standard.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue Nov 24 2009 - 19:09:17 CST