Re: ? Wrong definitions for combining character sequence in tr 29

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Nov 24 2009 - 19:05:26 CST

Next message: karl williamson: "Re: ? Wrong definitions for combining character sequence in tr 29"

Previous message: karl williamson: "Re: ? Wrong definitions for combining character sequence in tr 29"
Maybe in reply to: karl williamson: "? Wrong definitions for combining character sequence in tr 29"
Next in thread: karl williamson: "Re: ? Wrong definitions for combining character sequence in tr 29"
Reply: karl williamson: "Re: ? Wrong definitions for combining character sequence in tr 29"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Karl Williamson wrote:

> Thanks for your reply. I'm afraid I'm still confused.
>
> The sentence before Table 1b is the first mention in this document of
> combining character sequences; it would be nice it it discussed what
> they were, and why even mention them at all? In the past, I just
> presumed they were an earlier concept that was superseded by grapheme
> clusters.

It is an earlier concept. But it is not superseded by grapheme
clusters.

>
> They are discussed some in 3.6 of the actual standard, and here there
> seem to me to be contradictions:
>
> "• A grapheme cluster is similar, but not identical to a combining
> character sequence. A combining character sequence starts with a base
> character and extends across any subsequent sequence of combining marks,
> nonspacing or spacing. A combining character sequence is most directly
> relevant to processing issues related to normalization, comparison, and
> searching.
> • A grapheme cluster starts with a grapheme base and extends across any
> subsequent sequence of nonspacing marks. A grapheme cluster is most
> directly relevant to text rendering and such processes as cursor
> placement and text selection in editing."
>
> This seems to me to imply that a base character is always the first item
> of a combining character sequence,

Usually, yes, but not definitionally. Read D56 and D57 carefully.
A *defective* combining character sequence doesn't start with
a base character, but is a combining character sequence nonetheless.

> and the word 'any' seems to me to
> imply 0 or more marks following it.

For a grapheme cluster, yes. A single base character *is*
a grapheme cluster. It is *not* a combining character sequence.

> And this doesn't help me understand why there is the concept of a
> combining character sequence and why that is more relevant than a
> grapheme cluster to normalization, comparison, and searching.

Normalization is not defined in terms of grapheme clusters.
Grapheme clusters are about segmentation issues in text (which
is why they are defined in UAX #29, the UAX about text segmentation).

Normalization, on the other hand, is *definitionally* concerned
with combining character sequences, because at the core
of normalization is the canonical ordering of sequences of
combining marks. See the Canonical Ordering Algorithm subsection
of Section 3.11 Normalization Forms in the latest posted
version of the standard.

--Ken

Next message: karl williamson: "Re: ? Wrong definitions for combining character sequence in tr 29"
Previous message: karl williamson: "Re: ? Wrong definitions for combining character sequence in tr 29"
Maybe in reply to: karl williamson: "? Wrong definitions for combining character sequence in tr 29"
Next in thread: karl williamson: "Re: ? Wrong definitions for combining character sequence in tr 29"
Reply: karl williamson: "Re: ? Wrong definitions for combining character sequence in tr 29"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Nov 24 2009 - 19:09:17 CST