From: Dean Snyder (firstname.lastname@example.org)
Date: Fri May 07 2004 - 07:57:21 CDT
Peter Constable wrote at 1:00 PM on Thursday, May 6, 2004:
>> Here are the polar choices for XML:
>> TAGGED (but not encoded)...
>> ENCODED (but not tagged)...
>Note that tagging can be used as well as distinct encoding.
Of course - that's one reason why I said "polar choices".
>> The tagged version is not a "font minefield". On the contrary, it
>> explicitly provides an international standard mechanism for a level of
>> specification and refinement not possible via encoding. You can, for
>> example, do things like: <Phn subscript="Punic" locus="Malta"
>> font="Maltese Falcon">BT 'LM</Phn>. In fact, this is precisely the
>> of thing for which XML was designed.
>It *is* a minefield, because the correct interpretation of the text is
>dependent on particular fonts being on the recipients' systems. That
>fails the criterion of plain text legibility.
Of course. But that does not make tagged text a minefield - in the
absence of your nice Phoenician font Hebrew would show up instead -
precisely what is used by and large by Semiticists right now.
>> The untagged, but differently encoded version, on the other hand, IS a
>> search and text processing quagmire, especially when confronted by the
>> possibility of having to deal with multiplied West Semitic encodings,
>> e.g., for the various Aramaic "scripts" and Samaritan.
>Again, I find I have to disagree. It is much easier in searching to
>neutralize a distinction that to infer one. And, as has been stated, if
>there are distinct encodings, a given researcher can still use common
>indexing for their data if that suits their purpose.
I like the way Mark Davis put it (he uses the word "nightmare" for
processing over-deunified text):
Mark Davis wrote at 8:22 PM on Monday, May 3, 2004:
>- There is a cost to deunification. To take an extreme case, suppose that we
>deunified Rustics, Roman Uncials, Irish Half-Uncial, Carolingian Minuscule,
>Textura, Fraktur, Humanist, Chancery (Italic), and English Roundhand. All
>very different shapes. Searching/processing Latin text would be a nightmare.
>- There is also a cost to unification. To take an extreme case, suppose we
>unified Latin, Greek, Cyrillic, Arabic, and Hebrew (after all, they have a
>common ancester). Again, nightmare.
>So there is always a balance that we have to strike, looking at each
>carefully and assessing a number of different factors.
This is ALL I am trying to do here - just presenting some perspectives
that may not be apparent to non-specialists, in the hopes it will make
for a better informed decision.
>As has been stated, the distinct needs of two communities can be served
>well with two encodings; it is much more difficult to serve the distinct
>needs of a second group if the distinct things they want are merged into
>what the first group uses.
The problem is you are seeing this as "two encodings" for "two
communities". This does not represent the ground reality for West Semitic
researchers, who have to deal with many "encodings" for many communities.
Here is just ONE simple example of the kinds of problems we will be
confronted with if we start deunifying Northwest Semitic scripts:
As I've stated earlier I (and others) clearly recognize a milestone shift
between pre-exilic Old Hebrew "script" (based on Old Canaanite) and post-
exilic Jewish Hebrew "script" (based on Official Aramaic, which, in turn,
was based on Old Canaanite). This is a very clear-cut script shift
implemented by Jewish scribes at the time - almost perfectly analogous to
the Fraktur to modern German script shift.
If we deunify Old Canaanite/Phoenician from Hebrew, we will be faced with
In the Dead Sea Scrolls, in the same "library", there are some Biblical
manuscripts written in Old Hebrew and some Biblical manuscripts written
in Jewish Hebrew, with still others written in Jewish Hebrew with Old
Hebrew embedded in them. Clearly these scribes viewed Old Hebrew as a
conservative, archaizing diascript of Jewish Hebrew, or conversely,
Jewish Hebrew as a modern counterpart of Old Hebrew. (That this was not
just merely the retention of old, maybe somewhat illegible manuscripts by
trained scribes, is shown by the fact that BOTH diascripts were used in
If we have two applicable encodings available, will we use both or just
one of them for these texts? If we use both, text processing just became
more complicated. If we use one, we are ignoring an encoding made
explicitly available for one of the diascripts. But what is worse, if
somebody else has different practices than we do (and they WILL), text
processing has just become a "minefield" for everybody.
To me, this appears to be EXACTLY parallel to the use of Fraktur and
Roman in German, with the same text processing problems in Second Temple
Hebrew, were these diascripts deunified, as we would have in German, were
Fraktur and Roman deunified.
Clearly, unlike Mark Shoulson's experiments with modern Hebrew readers,
Second Temple Hebrew readers read BOTH diacripts side by side. And we,
who do research in this period, try to put ourselves in their sandals.
>> Obviously there is a need, in many cases, to maintain the distinction
>> between the various diascripts; the question is where should that
>> distinction be introduced - at the encoding level or higher? ...
>> But, what I'm afraid of with this proposal, as I've stated before, is
>> that its adoption will set a precedent that will result in a
>> snowballing of West Semitic encodings,
>All I have said is that I'm persuaded that something distinct should be
>encoded -- at the character encoding level, not in markup.
But WHY? We need EXPLICIT reasons to justify a new encoding. Just saying
that somebody wants it in XML because their font won't show up is
insufficient justification, especially when the repercussions in the
scholarly communities who actually use this stuff could be disruptive.
>> * Separately encode Phoenician, Old Hebrew, Samaritan, Archaic Greek,
>> Aramaic, Official Aramaic, Hatran, Nisan, Armazic, Elymaic, Palmyrene,
>> Mandaic, Jewish Aramaic, Nabataean ...
>I don't think anybody is looking for that many distinctions to be made.
I certainly hope not.
Dean A. Snyder
Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218
office: 410 516-6850
cell: 717 817-4897
This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:26 CDT