Re: interleaved ordering (was RE: Phoenician)

From: Philippe Verdy (
Date: Thu May 13 2004 - 03:45:07 CDT

  • Next message: Kent Karlsson: "RE: any unicode conversion tools?"

    From: "D. Starner" <>
    > What's the actual usage pattern for multi-lingual sorts?

    One of the most actual usage would be for the creation of book indexes
    referencing page in the book where words are used. Or plain-text search in a
    large corpus of texts with various languages. Or for sorting a list of people's
    names or books titles in a reference book or diary (in that case this is not a
    collation performed for searching but for presenting the results in a way that
    can be easily read by users)

    Unless multiple scripts are used simultaneously witin the same words, I see
    little value to sort the results with interleaved ordering, unless the script is
    really unpredictable and one wants to find all occurences of the *same* word
    written with distinct scripts.

    This last case will happen in modern Japanese because it is the same language,
    but I really doubt that there's a value to mix such results between Hebrew and
    Phoenician. This could happen only in some scholar applications, but most users
    will want these scripts separated as they refer to distinct languages.

    If there are still cases where one would really want to mix the results for
    searches, one can still use regular expressions to find words using bracketed
    pairs of letters, but still see the results osrted separately for each script
    (no tailoring needed then in that case). If searching into a Phoenician to
    Hebrew translation book, entries will be clearly making the distrinctions of

    Quote me a single personal name whose letters are mixed with multiple scripts. I
    think you'll have lots of difficulties to find such occurences face to the much
    larger corpus of texts using a single script. The genetics of scripts may have
    relationships here, but even today we don't mix Latin and Greek or Latin and
    Cyrillic in the same words, because of the reading ambiguities it would create
    (notably with Cyrillic-Latin: P or R, N or H...), instead we use transliteration
    of one script to the other. (For example we'll search for "SSSR" in Latin or
    "CCCP" in Cyrillic... searching for the Latin letters "CCCP" in a Russian text
    would find no occurence)

    This archive was generated by hypermail 2.1.5 : Thu May 13 2004 - 03:46:39 CDT