Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: jon@hackcraft.net
Date: Fri Dec 12 2003 - 08:43:53 EST

  • Next message: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"

    Quoting Peter Kirk <peterkirk@qaya.org>:

    [snip me quoting D17a]
    > >
    > >"in some way defective" is actually a good way to put it methinks, they
    > aren't
    > >illegal, and in some cases you can do things with them that are both
    > reasonable
    > >and useful, but in other situations they may be problematic.
    > >
    > >
    > >
    > >
    > Indeed. But I was thinking more in terms of grapheme clusters, as
    > defined in UAX #29. Is a defective combining sequence a grapheme
    > cluster? Probably not according to the definition "what the user thinks
    > of as a character or basic unit of the language". But the boundary rule
    > "/Break at the start and end of text./" implies that the algorithm will
    > count a defective combining sequence at the start of text (and possibly
    > what follows) as a default grapheme cluster. So it is "in some way
    > defective" as a grapheme cluster as well as as a character sequence.

    My understanding is that it would be counted, but I agree it doesn't
    match "what the user thinks of as a character" very well. So it's a grapheme
    cluster, but it's "in some way defective" :)

    > I note the following in UAX #29, which backs up my comments on functions
    > to count characters:
    >
    > > In those rare circumstances where end-users need character counts, the
    > > counts should correspond to the grapheme cluster boundaries.
    >
    > This implies that end users should not require counts of code units or
    > code points.

    I don't think anyone argued against this being what *end* users require.
    Certainly for small values of "end" anyway.

    --
    Jon Hanna                   | Toys and books
    <http://www.hackcraft.net/> | for hospitals:
                                | <http://santa.boards.ie>
    


    This archive was generated by hypermail 2.1.5 : Fri Dec 12 2003 - 09:27:53 EST