Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Peter Kirk (peterkirk@qaya.org)
Date: Fri Dec 12 2003 - 07:49:52 EST

Next message: jon@hackcraft.net: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"

Previous message: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
In reply to: jon@hackcraft.net: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: jon@hackcraft.net: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: jon@hackcraft.net: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 12/12/2003 04:13, jon@hackcraft.net wrote:

>>Thank you. I was supposing that isolated combining marks were considered
>>in some way defective,
>>
>>
>
><blockquote cite="http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf">
>D17a: Defective combining character sequence: A combining character sequence
>that does not start with a base character.
>
>[Explanatory Note] Defective combining character sequences occur when a
>sequence of combining
>characters appears at the start of a string or follows a control or format
>character.
>Such sequences are defective from the point of view of handling of combining
>marks, but are not ill-formed.
></blockquote>
>
>"in some way defective" is actually a good way to put it methinks, they aren't
>illegal, and in some cases you can do things with them that are both reasonable
>and useful, but in other situations they may be problematic.
>
>
>
>
Indeed. But I was thinking more in terms of grapheme clusters, as
defined in UAX #29. Is a defective combining sequence a grapheme
cluster? Probably not according to the definition "what the user thinks
of as a character or basic unit of the language". But the boundary rule
"/Break at the start and end of text./" implies that the algorithm will
count a defective combining sequence at the start of text (and possibly
what follows) as a default grapheme cluster. So it is "in some way
defective" as a grapheme cluster as well as as a character sequence.

I note the following in UAX #29, which backs up my comments on functions
to count characters:

> In those rare circumstances where end-users need character counts, the
> counts should correspond to the grapheme cluster boundaries.

This implies that end users should not require counts of code units or
code points.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/

Next message: jon@hackcraft.net: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Previous message: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
In reply to: jon@hackcraft.net: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: jon@hackcraft.net: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: jon@hackcraft.net: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Dec 12 2003 - 08:47:03 EST