Re: New contribution

From: Dean Snyder (
Date: Thu Apr 29 2004 - 10:13:00 EDT

  • Next message: Dean Snyder: "Re: New contribution"

    Mark E. Shoulson wrote at 1:01 AM on Thursday, April 29, 2004:

    >>As the situation stands right now, one simply encodes it in Hebrew or
    >>Latin transliteration, effectively deferring further analysis to other
    >>processes. This has its benefits.
    >And its drawbacks, since as you say, it's not an answer but a way of
    >avoiding an answer.

    Rather, deferring it to a level above the plain text level.

    >Mis-encoding? Another way to look at it is that by encoding it this way
    >or that way, you are thus making a *claim*, declaring the script to be
    >the one you most strongly believe it to be. What if you're wrong?
    >People, even respected researchers, have been wrong before, and science
    >marches on. (Other people, I mean; not me) If you don't want or don't
    >need to make such a claim, then you can use Hebrew as you do even now.
    >If you do want or need to make such a claim, then the consequences of
    >being wrong are they same as for any other claim.

    But I'm not sure these claims of distinction should be frozen at the
    plain text level.

    Problems arise when you want to develop and use software that works with
    this possible mishmash of conflictingly encoded "scripts".

    A simple example:

    You want to do a new Unicode-based dictionary of West Semitic
    inscriptions, something like Jean and Hoftijzer's Dictionnaire des
    inscriptions sémitiques de l'ouest. While amassing a large number of
    encoded texts you realize that some Phoenician texts are encoded as
    Hebrew, and vice versa; some Old Aramaic texts are encoded as Imperial
    Aramaic; and some Old Hebrew texts are encoded as Middle Hebrew, ... You
    want to do various categorizations of these texts based on script and you
    want to print your dictionary with appropriate fonts for quotations from
    the various texts. You will need to write software that re-encodes those
    texts you think are wrongly encoded and you may need fonts that, in the
    same font, make some of these diascripts look like others.

    Now multiply this same mess for every other Unicode-based West Semitic
    dictionary, grammar, textbook, web page, research article, database,
    search engine, and end user software project and you begin to see the
    kinds of problems caused by the proliferation of wrong-headed sub-
    divisions of West Semitic "scripts".

    If we leave things as they are now, one need only tag the text for
    appropriate categorization and action.

    In my view, encoding is more basal, more static; markup is derivative and
    interpretive (based as it is on encoded text, for example), and therefore
    more transient and ephemeral.

    The question whose answer we need to plausibly defend is - What, in
    Ancient West Semitic "scripts", is usefully distinguished in PLAIN TEXT,
    and what is not?

    If I had to take a position right now, I would think that encoding Old
    Canaanite (not Phoenician) and Samaritan is useful, but I would leave
    Aramaic, et al. for more expert, soul-searching discussion.


    Dean A. Snyder

    Assistant Research Scholar
    Manager, Digital Hammurabi Project
    Computer Science Department
    Whiting School of Engineering
    218C New Engineering Building
    3400 North Charles Street
    Johns Hopkins University
    Baltimore, Maryland, USA 21218

    office: 410 516-6850
    cell: 717 817-4897

    This archive was generated by hypermail 2.1.5 : Thu Apr 29 2004 - 10:54:45 EDT