From: Jim Allan (jallan@smrtytrek.com)
Date: Sat Dec 27 2003 - 00:01:10 EST
Mark E. Shoulson wrote:
> This is a particularly cogent point. The Mishna (c. 1st century C.E.)
> does explicitly distinguish between Paleo-Hebrew and Square Hebrew
> (tractate Yadayim 4:5). That's not a font-difference, that's a
> script-difference, I think.
There were no such things as fonts in the 1st century C.E. So it would
have to be a script-difference. But what is a "script"?
"Script", as I pointed out previously, is a word of wide meaning. The
difference between Paleo-Hebrew and Square Hebrew is a script
difference. But the word "script" is also used for different varieties
of the Square Hebrew script. Check in Google for ["rashi script"] or
["ari script"] . There is a two-volume book:
_Specimans of Medieval Hebrew Scripts_ by Malachi beit Arie. See
http://www.bookgallery.co.il/content/english/static/book8177.asp
Check also in Google for ["italic script"], ["uncial script"],
["blackletter script" OR "black letter script"].
We are talking about exactly the same alphabet (or abjad) here,
twenty-two letters in the same order with identical meaning originating
from the same sources recording the identical text with identical spelling.
Compare the gradual change from blackletter "scripts" to Antiqua style
Latin characters (including the italic script) in Renaissance and
post-Renaissance Europe. This is similar to the change from Phoenician
style to Aramaic style.
> This is the other really significant point: Semitic scholars may all
> agree, but all the world is not Semitic scholarship, and non-{Semitic
> scholars} have to be satisfied as well. Since the Semitic scholars
> are also getting what they want, where's the harm in encoding more
> alphabets?
Who are these non-scholars who want the Palmyrene script (for example)
to be encoded separately from other Aramaic scripts? Who are the
scholars who want this? How many persons in the world want Palmyrene to
be encoded separately? As many as fifty? Or is there just Michael Everson?
There may be some such scholars, and if so I would like to hear the
arguments they would bring forth. I'm willing to be convinced by
arguments. I'm not an *expert* in Aramaic scripts. There aren't that
many who are.
As to harm, where's the harm in encoding Japanese kanzi separately, or
Latin uncial, or a complete set of small capitals as a third case?
Where's the harm in encoding Latin Renaissance scripts separately?
No harm perhaps, but no good either. There is no need or use for such
encodings. Scholars using Latin letters and non-scholars using Latin
letters are not asking for separate coding of the script used in the
Beowulf manuscript and so forth. They don't want every Latin "script"
variation encoded separately.
> It's not *that* simple: one could argue (as is being done) that more
> alphabets would lead to confusion about which one should be used, and
> mess up searches. I guess we'd just have to make sure that people
> doing scholarly work in Semitic languages know to use Hebrew all the
> time (they already know that), no matter what the language.
But the point is that many of these Semitic language use the *same*
abjad with different styling, one such styling being the letters encoded
in Unicode as Hebrew letters with default glyphs of modern Hebrew form.
Only the letter shapes are different. But between some northwest Semitic
"scripts" they are not very different, less so than between Latin
"script" and Latin "script".
Second, people doing work in Semitic languages using the Latin alphabet
do also often use Latin transliterations (which do not all agree). I
assume that there are also standard Cyrillic transliterations used by
scholars using the Cyrillic alphabet and so forth.
Such things are not for Unicode to regulate.
> And in cases where material is to be incorporated from non-scholarly
> sources who used another alphabet, that can be transcoded when entered
> into databases to keep them uniform if that's what's necessary, but
> presumably that wouldn't happen often.
What non-scholarly sources? Why would a non-scholar *need* or *desire*
Palmyrene Aramaic encoded separately while a scholar would not? A change
to a Palmyrene Aramaic font would do the job as well, for Palmyrene
Aramaic and any of the various Aramaic "scripts" or "styles" just as a
font change does for historical styles for European scripts if someone
want to print of display them. In fact such fonts do poorly, just as a
general black letter medieval font will do poorly for anything but the
exact manuscript on which it was based, if based on a particular
manuscript. There are no fonts before modern times, no exactly
standardized characters, no exactly standardized type styles. Every
scribe has a different hand. Characters in simple charts of Semitic
scripts are often deceptive just as charts of forms taken by medieval
Latin characters in particular "scripts"/"styles" are deceptive, often
being a choice made by a scholar from many variants.
Coding Aramaic generally as a single script in Unicode would code all
the "script" variations. This has already been done by encoding the
square Aramaic letters in their "modern Hebrew" forms. What more is
needed for encoding? Similarly Latin has been encoded with modern Latin
letter forms as the default glyphs and Greek has been encoded with
modern Greek letter forms as the default glyphs. One might want some
further final forms and additional punctuation for Aramaic styles (or
might not). That can be decided. Otherwise, there is nothing much more
to do, save perhaps add a matrix somewhere showing variant glyphs in
different Aramaic "scripts"/"styles".
To take another example, all runic "scripts" have been unified in
Unicode, though the runic "scripts" vary greatly in the number of
letters used and in the values of the letters as well as in their
appearance. There is more *reason* to produce separate encodings for the
various runic scripts then for northwest Semitic "scripts", though I've
heard no complaints about the unification of runic "scripts" and I have
no complaints myself.
Indeed, there is no *reason* when looking at the values of the
characters of the Semitic "scripts" related to Phoenician that there
could not have been a single encoding for the consonants for *all* these
supposed "scripts" (with separate encodings for the pointings).
A common Semitic encoding *could* still be added to Unicode, with
individuals deciding whether or not to use that coding also for Arabic,
Hebrew and Syriac.
I am not recommending this.
I am pointing out how much these scripts are seen to be stylistic
variants of one another to one who can to some extent read them.
If one must split them up, charts and scholarly books do provide normal
divisions of "scripts" or "styles" which correspond to those given by
Michael Everson at http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf
All that has been well worked out for the common "scripts". A normal
division is:
1.) Proto-Sinaitic and other early pictographs.
2.) Old Arabic "scripts" (Old South Arabic and Old North Arabic).
3.) Northwest Semitic (the 22-character abjad including Phoenician
scripts, descendant Aramaic scripts such as square Aramaic used for
Hebrew and also including Syriac).
4.) Arabic (which though descended from Nabatean Aramaic became so
different that it might be better encoded separately, perhaps to be
compared to the Aramaic scripts in somewhat the same way as Latin might
be compared to early Greek scripts).
The common 22-character Northwest Semitic abjad can be broken down into:
1.) Phoenician/Canaanite scripts including Paleo-Hebrew and its
descendant Samaritan and also Paleo-Aramaic.
2.) Later Aramaic scripts.
3.) Syriac scripts which differ greatly in appearance from the other
Aramaic scripts.
Note: special appearance and pointing for Hebrew and Syriac is really
the only reason to distinguish these particularly. The letters are the
same in origin and are more the same in meaning than between Greek
script and variant Greek script. Greek letters in variant Greek scripts
however are (generally) far more alike in appearance than the characters
of the various early northwest Semitic "scripts"/"styles".
But should a difference in appearance count in a decision to code
separately within Unicode when *every* other feature of two "scripts" is
identical, including origin?
Hebrew scriptures were first written in the Phoenician script (=
Paleo-Hebrew), then in Aramaic script which developed *very* slightly in
medieval times to the normal modern Hebrew script. Emerson's division
would suggest four different scripts ought to be used for coding the
same texts with the same logical characters with the same names, that
texts should be encoded as Phoenician or Aramaic or Hebrew or Samaritan
depending on style, when when letter-by-letter the same.
Cursive Hebrew still retains for some letter forms the Phoenician shapes
(which is very strange). Should cursive Hebrew therefore be encoded
separately?
I don't see any purpose in encoding these scripts differently in Unicode
when they represent *exactly* the same abjad with only different styling
of the characters.
Michael Everson at http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf could
only say:
<< Note that Jony Rosenne once suggested that we should not encode
Phoenician because it is a
glyph variant of Hebrew. This is not true, despite the one-to-one
correspondence of character entities. In the Dead Sea Scrolls, for
instance, where the Tetragrammaton is written with Paleo-Hebrew letters,
it is (in UCS
encoding terms) the Phoenician script in which the Name is written. >>
First, there is not *just* a one-to-one correspondence of character
entities but also one-to-one correspondence of the characters in respect
to their origin and names. They *are* the same abjad in all but style.
Second, if it is argued that the use of Phoenician script for the
Tetragrammaton in some texts otherwise written in square Aramaic
characters indicates that Phoenician and square Aramaic characters must
be encoded separately within Unicode, should not one make the same
argument for medieval texts with a headline "script" imitating
traditional Roman square capitals, initial paragraphs in uncial "script"
and the main text in Carolingian "script" including majuscule and
miniscule letters?
If Everson's argument is applied to medieval manuscripts, uncial
"script" and Carolingian "script" and Roman capitals should be encoded
separately within Unicode.
Also, the Tetragrammaton is represented in the English King James
translation of Hebrew scriptures and in some more recent translations by
the word LORD and sometimes GOD in which all but the first letter is
printed in small capitals. Should small capitals therefore be encoded
separately in Unicode?
(Note: these small capitals are the small capitals normally used for
emphasis and usually appear slightly higher than the normal lowercase
characters lacking ascenders. They are not the same as the lower case
small capital characters coded in Unicode as phonetic characters which
properly appear as identical in height to other lower case characters.)
That characters of one style are used in a text written predominately in
another style does not indicate that the "script" or "style" to which
they belong needs to be coded independently. That is what markup is for.
Peter Kirk has already made this point in part.
There seems to me *no* reason why most of Aramaic "scripts" should not
be unified within Unicode with Hebrew and almost *no* reason why
Phoenician and Samaritan should not be unified.
And there seems to me *little* reason why Hebrew/Aramaic "scripts" and
Phoenician/Samaritan "scripts" should not be unified. The two families
of styles use the same abjad though with differences in appearance too
great for most of the letters to be seen as the same letters between the
two families by appearance alone.
But how much should visual distinction count when it is the *sole*
difference? It appears to me that this is where dispute lies mostly,
despite the precedent of the Unicode encoding of runic "scripts".
There may also be some thinking of HTML/XML/XHTML web display of
characters where forcing of font is not reliable. One would not want a
discussion of ancient Phoenician characters to display modern Hebrew
forms! But this same problem currently applies to runes, medieval Latin
characters, Han characters and so forth. One shouldn't let the current
shortcomings of one display method among many dictate Unicode encodings.
Jim Allan
This archive was generated by hypermail 2.1.5 : Sat Dec 27 2003 - 01:33:46 EST