Re: interleaved ordering (was RE: Phoenician)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu May 13 2004 - 03:45:07 CDT

Next message: Kent Karlsson: "RE: any unicode conversion tools?"

Previous message: Antoine Leca: "Re: TR35"
In reply to: D. Starner: "RE: interleaved ordering (was RE: Phoenician)"
Next in thread: Michael Everson: "RE: interleaved ordering (was RE: Phoenician)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "D. Starner" <shalesller@writeme.com>
> What's the actual usage pattern for multi-lingual sorts?

One of the most actual usage would be for the creation of book indexes
referencing page in the book where words are used. Or plain-text search in a
large corpus of texts with various languages. Or for sorting a list of people's
names or books titles in a reference book or diary (in that case this is not a
collation performed for searching but for presenting the results in a way that
can be easily read by users)

Unless multiple scripts are used simultaneously witin the same words, I see
little value to sort the results with interleaved ordering, unless the script is
really unpredictable and one wants to find all occurences of the *same* word
written with distinct scripts.

This last case will happen in modern Japanese because it is the same language,
but I really doubt that there's a value to mix such results between Hebrew and
Phoenician. This could happen only in some scholar applications, but most users
will want these scripts separated as they refer to distinct languages.

If there are still cases where one would really want to mix the results for
searches, one can still use regular expressions to find words using bracketed
pairs of letters, but still see the results osrted separately for each script
(no tailoring needed then in that case). If searching into a Phoenician to
Hebrew translation book, entries will be clearly making the distrinctions of
languages.

Quote me a single personal name whose letters are mixed with multiple scripts. I
think you'll have lots of difficulties to find such occurences face to the much
larger corpus of texts using a single script. The genetics of scripts may have
relationships here, but even today we don't mix Latin and Greek or Latin and
Cyrillic in the same words, because of the reading ambiguities it would create
(notably with Cyrillic-Latin: P or R, N or H...), instead we use transliteration
of one script to the other. (For example we'll search for "SSSR" in Latin or
"CCCP" in Cyrillic... searching for the Latin letters "CCCP" in a Russian text
would find no occurence)

Next message: Kent Karlsson: "RE: any unicode conversion tools?"
Previous message: Antoine Leca: "Re: TR35"
In reply to: D. Starner: "RE: interleaved ordering (was RE: Phoenician)"
Next in thread: Michael Everson: "RE: interleaved ordering (was RE: Phoenician)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu May 13 2004 - 03:46:39 CDT