From: Peter Kirk (firstname.lastname@example.org)
Date: Thu May 13 2004 - 10:06:22 CDT
On 13/05/2004 05:01, John Cowan wrote:
>Peter Kirk scripsit:
>>>I would have just as many objections to doing that as I would with
>>>unifying it with Hebrew. Users don't expect this kind of interfiling
>>>when looking things up in ordered lists. Interfiling of scripts
>>Well, I see the point. But presumably the only people who would collate
>>a text containing a mixture of Hebrew and Phoenician, for example, are
>>those who know and understand both scripts. For anyone else this is a
>>matter of garbage in, garbage out. So it should be up to these users to
>>decide whether the legibility concern, which is a real one, is more
>>important than their otherwise expressed preference for interfiling.
>In addition, it's important to always remember that "collation" is a
>cover term for both sorting *and* searching. Collating Hebrew with
>"Phoenician" at the first level means that a search using Hebrew
>letters will find "Phoenician" text as well.
>(I am using horror quotes to remind people that Unicode "Phoenician"
>includes many non-Punic 22CWSAs, particularly Palaeo-Hebrew.)
>If indeed Serbs prefer collation equivalence between Cyrillic and
>Latin (which can only be a tailored preference, of course; in general
>we don't want to do that), this means not only that they will see
>the two interfiled in a sorted list, but also that searching for a
>Serbian word in Cyrillic will find it in Latin and vice versa.
Thank you, John, for making the point which most others have missed.
This issue is not primarily one of sorting, because multi-script
individual texts are rather rare. The far more significant issue is
searching, of a text corpus or for that matter of the whole Internet.
Suppose I am looking for a Hebrew or Phoenician, or Serbian or
Azerbaijani text on the Internet. I don't know, and (if I can read both
scripts) I don't care which script it is in, I want to match the text
anyway. For such applications interleaved collation would be very helpful.
I am not proposing interleaved collation of Latin and Cyrillic as a
default simply, because each of the several languages which can be
written in both scripts has a different transliteration scheme. So
tailoring will be required to do this kind of searching for Serbian or
Azerbaijani. But we have the chance to start afresh with Phoenician, and
the correspondence between the Hebrew and Phoenician alphabets is
Perhaps someone, some day, will produce an Internet search engine which
accepts Unicode tailored collation. But I won't hold my breath.
PS Multi-language bibliographies are common in Russian books. They are
usually printed with the Latin script entries following the Cyrillic
script ones, but I have seen interleaved ones.
-- Peter Kirk email@example.com (personal) firstname.lastname@example.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Thu May 13 2004 - 11:31:56 CDT