    A couple of weeks ago, in this thread Philippe Verdy said:

    > Breaking on words, even if it requirs a very modest buffering,
    > will significantly improve the processing time,
    > because each word in the long texts will be scanned only
    > once, and all the rest will occur within the small and
    > constantly reused buffer.
    > I don't forget that in most practical cases, sorts will operate
    > on texts whose collation keys have been only partly
    > generated and truncated, because they really speed up and
    > reduce the number of compares to perform ...

    and so on.

    Instead of continuing the discussion with a back and forth in
    email, I decided instead to write a Unicode Technical Note
    on the general topic, including a case study of alternative
    orderings for a French topic list.

    Those who are interested in collation and in the particular issues
    that were discussed in this thread may wish to take a look:


