From: Dean Snyder (dean.snyder@jhu.edu)
Date: Thu Apr 29 2004 - 10:13:00 EDT
Mark E. Shoulson wrote at 1:01 AM on Thursday, April 29, 2004:
>>As the situation stands right now, one simply encodes it in Hebrew or
>>Latin transliteration, effectively deferring further analysis to other
>>processes. This has its benefits.
>>
>And its drawbacks, since as you say, it's not an answer but a way of
>avoiding an answer.
Rather, deferring it to a level above the plain text level.
>Mis-encoding? Another way to look at it is that by encoding it this way
>or that way, you are thus making a *claim*, declaring the script to be
>the one you most strongly believe it to be. What if you're wrong?
>People, even respected researchers, have been wrong before, and science
>marches on. (Other people, I mean; not me) If you don't want or don't
>need to make such a claim, then you can use Hebrew as you do even now.
>If you do want or need to make such a claim, then the consequences of
>being wrong are they same as for any other claim.
But I'm not sure these claims of distinction should be frozen at the
plain text level.
Problems arise when you want to develop and use software that works with
this possible mishmash of conflictingly encoded "scripts".
A simple example:
You want to do a new Unicode-based dictionary of West Semitic
inscriptions, something like Jean and Hoftijzer's Dictionnaire des
inscriptions sémitiques de l'ouest. While amassing a large number of
encoded texts you realize that some Phoenician texts are encoded as
Hebrew, and vice versa; some Old Aramaic texts are encoded as Imperial
Aramaic; and some Old Hebrew texts are encoded as Middle Hebrew, ... You
want to do various categorizations of these texts based on script and you
want to print your dictionary with appropriate fonts for quotations from
the various texts. You will need to write software that re-encodes those
texts you think are wrongly encoded and you may need fonts that, in the
same font, make some of these diascripts look like others.
Now multiply this same mess for every other Unicode-based West Semitic
dictionary, grammar, textbook, web page, research article, database,
search engine, and end user software project and you begin to see the
kinds of problems caused by the proliferation of wrong-headed sub-
divisions of West Semitic "scripts".
If we leave things as they are now, one need only tag the text for
appropriate categorization and action.
In my view, encoding is more basal, more static; markup is derivative and
interpretive (based as it is on encoded text, for example), and therefore
more transient and ephemeral.
The question whose answer we need to plausibly defend is - What, in
Ancient West Semitic "scripts", is usefully distinguished in PLAIN TEXT,
and what is not?
If I had to take a position right now, I would think that encoding Old
Canaanite (not Phoenician) and Samaritan is useful, but I would leave
Aramaic, et al. for more expert, soul-searching discussion.
Respectfully,
Dean A. Snyder
Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218
office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi
This archive was generated by hypermail 2.1.5 : Thu Apr 29 2004 - 10:54:45 EDT