Re: Aramaic unification and information retrieval

From: Michael Everson (everson@evertype.com)
Date: Sun Dec 21 2003 - 19:01:13 EST

  • Next message: Radovan Garabik: "Re: Unicode->ASCII approximate conversion"

    At 14:18 -0800 2003-12-21, Peter Kirk wrote:

    >So, "KA is KA is KA is KA and BHA is BHA is BHA is BHA", and ALEF is
    >ALEF is ALEF is ALEF, except when it comes to comparing them and
    >collating them?

    In the context which I was speaking, yes. The Indic KAs have a
    one-to-one relationship, historically. We know this. Likewise the
    Semitic ALEFs. That doesn't mean that we should unify the Indic
    scripts all into one (which we haven't) or that we should unify all
    the Semitic scripts into one.

    If you have a multiscript database for Pali and you need to search
    all the KAs accross scripts, you will have to have a local engine to
    do so. The scripts are distinct as encoded in the Unicode standard.

    If you want to sort such a database, illegible as the result would
    be, you can do it, with a local tailoring for your specific purpose.
    The default table in the UCA will not interfile them, however,
    because it orders the scripts sequentially (apart from digits, which
    are treated differently because of their particular properties). I'm
    not saying you can't tailor. You can. I'm saying we're not going to
    change what we are doing in the UCA and ISO/IEC 14651 because it
    distinguishes scripts on purpose.

    >Of course if one collates together a mixture of Latin script texts
    >in very different fonts and styles one can get an outrageously messy
    >list which is illegible to those who don't know all the different
    >fonts.

    I do not consider the Semitic nodes we are considering for eventual
    encoding to be font variants of each other.

    >But that is hardly the point. Anyway, I don't see the main purpose
    >of collation as producing lists of legible words, but rather as
    >matching in text and database searches.

    Which you as an expert can do with special tools.

    >Michael, do you realise that I am trying to offer you an olive
    >branch, and all I get is it thrown back in my face, nicely by you
    >but rudely by someone else offlist.

    No, I didn't. In the first place I didn't know that we were at war.
    In the second place, all I'm telling you is that we have practices
    which are generic to certain levels of our work, and we are not
    likely to deviate from those practices. That's not throwing something
    in your face. That's telling you what's what. We had a similar
    discussion about generic practice when we were putting Runic into the
    UCA. Swedish specialists wanted a Latin-based order. That's specific.
    Everyone else, though, would want the native Futhark order. The
    Japanese NB, which doesn't really worry about Runes much, thought
    that the generic order should be the basic historical one.

    >I think that it just might be acceptable to encode the various
    >ancient Semitic scripts separately if they are unified for collation.

    You can tailor a unified collation for them or indeed for anything you like.

    >But if you are saying that it must be all or nothing, I will
    >continue to fight on behalf of the users of these scripts for all of
    >what they want, rather than what you have apparently unilaterally
    >(on the basis of a book which describes glyph shape differences
    >rather than the systematic differences which really distinguish
    >scripts) decided that they ought to want and have written into your
    >Roadmap.

    *I* have not decided on the basis of *one* book, thanks very much.
    Nor have I done anything unilaterally. Nor have we made decisions
    which aren't based on our normal working practice.

    I'm not interested in worrying about these bits of the Roadmap right
    now. If I work on anything over the Christmas, it should be N'Ko.
    Then there is more work on Cuneiform. Then work on Manichaean and
    Avestan. Then I've got to prepare for the PDAM comments. This
    sniping, even when nice, isn't doing you any good, nor me. Can we
    drop this for a while, please?

    Michael

    (I am sorry you had rude private mail from someone. I also had
    private mail from someone which suggested that I didn't know anything
    about Indic scripts, while saying a whole lot of other rather
    incomprehensible things about ISCII and Unicode. Better forgotten.)



    This archive was generated by hypermail 2.1.5 : Sun Dec 21 2003 - 19:37:11 EST