From: Peter Kirk (
Date: Wed Mar 17 2004 - 14:53:23 EST

    On 17/03/2004 11:30, Ernest Cline wrote:

    >Mixed Turkish and other European language documents that are without
    >language markup have the same problem, no matter where the burden
    >is placed. Some I's will receive inappropriate glyphs when a casing rule
    >is applied. The problem is just as pronounced with either method, and
    >the need to rewrite such documents to ensure proper casing is the same.
    >I will admit that my preferred solution has higher initial costs, but lower
    >long term costs that cause me to favor it. In any case, changing to my
    >preferred solution now would not be worth the confusion that would be
    >caused. If there ever is a successor to Unicode, then it would be worth
    >examining this idea, but such an event is at least twenty years away.
    Your preferred solution has advantages only if the long term costs are
    real. But how often is it necessary to apply casing rules to existing
    documents? Quite rarely, I would think. Search engines might want to, I
    agree, but I would expect a basic search engine to fold dotted and
    dotless i on the basis that they cannot be distinguished reliably. On
    your solution the costs must be borne for all documents before moving to
    Unicode or its putative successor; with the solution chosen by Unicode,
    the costs need be borne only for the minority of documents which
    actually need casing rules applied to them.

    We might hope that within twenty years almost all new documents will be
    marked with their language.

    Peter Kirk (personal) (work)

