Re: Aramaic unification and information retrieval

From: Peter Kirk (peterkirk@qaya.org)
Date: Sun Dec 21 2003 - 17:18:40 EST

  • Next message: Kent Karlsson: "RE: johab compound letters reference for Hangul? (3)"

    On 21/12/2003 13:16, Michael Everson wrote:

    > At 12:33 -0800 2003-12-21, Peter Kirk wrote:
    >
    >>> Nonsense. Of course you can. KA is KA is KA is KA and BHA is BHA is
    >>> BHA is BHA. The *reading rules* for pronouncing what's been written
    >>> differ, but the transliteration is by and large one-to-one. Tamil of
    >>> course is an exception, having lost some consonants.
    >>
    >>
    >> Michael, in view of this do you think it might be sensible to treat
    >> the different Indic scripts as equivalent for collation purposes?
    >
    >
    > No, not at all. Not in the default template. The default template
    > sorts scripts separately.
    >
    >> This might be especially useful with a corpus of material in one
    >> language e.g. Sanskrit but using different scripts.
    >
    >
    > Actually I rather think it would form a list which was an outrageously
    > illegible mess.
    >
    >> And then, how about the Semitic scripts? After all, ALEF is ALEF is
    >> ALEF is ALEF and ...
    >
    >
    > Nope. It would also be an outrageously illegible mess. But you can
    > tailor it locally if you wanted to.

    So, "KA is KA is KA is KA and BHA is BHA is BHA is BHA" (to quote
    Michael Everson, just in case the person who accused me offlist of
    talking nonsense as if it was authoritative misunderstands the
    situation), and ALEF is ALEF is ALEF is ALEF, except when it comes to
    comparing them and collating them?

    Of course if one collates together a mixture of Latin script texts in
    very different fonts and styles one can get an outrageously messy list
    which is illegible to those who don't know all the different fonts. But
    that is hardly the point. Anyway, I don't see the main purpose of
    collation as producing lists of legible words, but rather as matching in
    text and database searches.

    Michael, do you realise that I am trying to offer you an olive branch,
    and all I get is it thrown back in my face, nicely by you but rudely by
    someone else offlist. I think that it just might be acceptable to encode
    the various ancient Semitic scripts separately if they are unified for
    collation. But if you are saying that it must be all or nothing, I will
    continue to fight on behalf of the users of these scripts for all of
    what they want, rather than what you have apparently unilaterally (on
    the basis of a book which describes glyph shape differences rather than
    the systematic differences which really distinguish scripts) decided
    that they ought to want and have written into your Roadmap.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Sun Dec 21 2003 - 17:59:10 EST