RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Dec 12 2003 - 12:06:59 EST

  • Next message: Andrew C. West: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"

    > -----Message d'origine-----
    > De : jon@hackcraft.net [mailto:jon@hackcraft.net]
    > Envoyé : vendredi 12 décembre 2003 17:28
    > À : verdy_p@wanadoo.fr
    > Objet : RE: Text Editors and Canonical Equivalence (was Coloured
    > diacritics)
    >
    >
    > Quoting Philippe Verdy <verdy_p@wanadoo.fr>:
    >
    > > Peter Kirk wrote:
    > > > On 12/12/2003 04:31, Michael Everson wrote:
    > > > > At 12:17 +0000 2003-12-12, Arcane Jill wrote:
    > > > >> And what, I find myself wondering, does "nearly infinite" mean?
    > > > > It means "finite".
    > > >
    > > > Except in the original context it should have meant
    > "infinite", as there
    > > > is actually an infinite number of potential default grapheme clusters.
    > >
    > > I really meant "nearly infinite", because even if the potential default
    > > grapheme clusters are "infinite", the actual ones that have meaningful
    > > semantics and effective usage are "finite" (within the finite
    > set of code
    > > points currently assigned in a precise Unicode version), but
    > currently not
    > > precisely enumerable (that's where "nearly infinite" makes sense).
    >
    > Any rules for sanity-checking grapheme clusters would be at a
    > higher level than
    > Unicode, so it is indeed infinite, or at least until the real
    > world gets in the
    > way with a power cut or such.
    >
    > void outputInfiniteValidGraphemeCluster(std::wostream& wos)
    > {
    > wos.put(L'\a')
    > for(;;)
    > wos.put(L'\u0300');
    > }

    I know that your function will create an infinite grapheme cluster
    but valid in Unicode. Whever it has some meaning is very doubtful.
    In fact I doubt that any language will ever accept more than two
    occurences of the same character in the same grapheme cluster.

    So in reality, the following DGCs will be valid and acceptable:
            "a",
            "a\u0300",
            "a\u0300\u0300"
    but then the following DGCs will be valid but inacceptable:
            "a\u0300\u0300\u0300"
            "a\u0300\u0300\u0300\u0300"
            "a\u0300\u0300\u0300\u0300\u0300"
            ...

    __________________________________________________________________
    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE! http://www.ellaforspam.com





    This archive was generated by hypermail 2.1.5 : Fri Dec 12 2003 - 13:01:18 EST