From: Mark Davis (
Date: Mon Feb 23 2009 - 13:57:01 CST

  • Next message: Mark Davis: "Re: NFC FAQ"

    What I've been trying to say is that a word with a single non-NFC sequence
    *would be* the a typical, non-contrived, "worst" case in terms of
    performance. Words with multiple non-NFC sequences are a vanishingly small
    proportion of the web.

       - If you want a very worst case (but completely unlikely in practice,
       except perhaps maliciously), something like the 999,999 combining
       - If you want a typical, uncontrived, worst case, something like "*
       No\u0308**rmalization*" works well.
       - If you want something between those, figure out what you mean, because
       I don't know of any better example.


    On Mon, Feb 23, 2009 at 10:42, Asmus Freytag <> wrote:

    > On 2/23/2009 10:01 AM, Mark Davis wrote:
    >> The worst performance would be (in the 1M character example I've been
    >> using), something like a base character followed by a list of 999,999
    >> characters with CCC != 0, sorted by CCC in reverse order. I added a note to
    >> this effect.
    > No, the worst case would be the 2M example...
    > Actually, the problem with such kind of examples is that they don't speak
    > to what you can realistically expect in non-contrived situations.
    > A./

    This archive was generated by hypermail 2.1.5 : Mon Feb 23 2009 - 13:58:33 CST