[hebrew] Re: Ancient Northwest Semitic Script (was Re: why Aramaic now)

From: Jim Allan (jallan@smrtytrek.com)
Date: Sat Dec 27 2003 - 00:01:10 EST

  • Next message: Jungshik Shin: "Re: Ancient Northwest Semitic Script"

    Mark E. Shoulson wrote:

    > This is a particularly cogent point. The Mishna (c. 1st century C.E.)
    > does explicitly distinguish between Paleo-Hebrew and Square Hebrew
    > (tractate Yadayim 4:5). That's not a font-difference, that's a
    > script-difference, I think.

    There were no such things as fonts in the 1st century C.E. So it would
    have to be a script-difference. But what is a "script"?

    "Script", as I pointed out previously, is a word of wide meaning. The
    difference between Paleo-Hebrew and Square Hebrew is a script
    difference. But the word "script" is also used for different varieties
    of the Square Hebrew script. Check in Google for ["rashi script"] or
    ["ari script"] . There is a two-volume book:
    _Specimans of Medieval Hebrew Scripts_ by Malachi beit Arie. See
    http://www.bookgallery.co.il/content/english/static/book8177.asp

    Check also in Google for ["italic script"], ["uncial script"],
    ["blackletter script" OR "black letter script"].

    We are talking about exactly the same alphabet (or abjad) here,
    twenty-two letters in the same order with identical meaning originating
    from the same sources recording the identical text with identical spelling.

    Compare the gradual change from blackletter "scripts" to Antiqua style
    Latin characters (including the italic script) in Renaissance and
    post-Renaissance Europe. This is similar to the change from Phoenician
    style to Aramaic style.

    > This is the other really significant point: Semitic scholars may all
    > agree, but all the world is not Semitic scholarship, and non-{Semitic
    > scholars} have to be satisfied as well. Since the Semitic scholars
    > are also getting what they want, where's the harm in encoding more
    > alphabets?

    Who are these non-scholars who want the Palmyrene script (for example)
    to be encoded separately from other Aramaic scripts? Who are the
    scholars who want this? How many persons in the world want Palmyrene to
    be encoded separately? As many as fifty? Or is there just Michael Everson?

    There may be some such scholars, and if so I would like to hear the
    arguments they would bring forth. I'm willing to be convinced by
    arguments. I'm not an *expert* in Aramaic scripts. There aren't that
    many who are.

    As to harm, where's the harm in encoding Japanese kanzi separately, or
    Latin uncial, or a complete set of small capitals as a third case?
    Where's the harm in encoding Latin Renaissance scripts separately?

    No harm perhaps, but no good either. There is no need or use for such
    encodings. Scholars using Latin letters and non-scholars using Latin
    letters are not asking for separate coding of the script used in the
    Beowulf manuscript and so forth. They don't want every Latin "script"
    variation encoded separately.

    > It's not *that* simple: one could argue (as is being done) that more
    > alphabets would lead to confusion about which one should be used, and
    > mess up searches. I guess we'd just have to make sure that people
    > doing scholarly work in Semitic languages know to use Hebrew all the
    > time (they already know that), no matter what the language.

    But the point is that many of these Semitic language use the *same*
    abjad with different styling, one such styling being the letters encoded
    in Unicode as Hebrew letters with default glyphs of modern Hebrew form.
    Only the letter shapes are different. But between some northwest Semitic
    "scripts" they are not very different, less so than between Latin
    "script" and Latin "script".

    Second, people doing work in Semitic languages using the Latin alphabet
    do also often use Latin transliterations (which do not all agree). I
    assume that there are also standard Cyrillic transliterations used by
    scholars using the Cyrillic alphabet and so forth.

    Such things are not for Unicode to regulate.

    > And in cases where material is to be incorporated from non-scholarly
    > sources who used another alphabet, that can be transcoded when entered
    > into databases to keep them uniform if that's what's necessary, but
    > presumably that wouldn't happen often.

    What non-scholarly sources? Why would a non-scholar *need* or *desire*
    Palmyrene Aramaic encoded separately while a scholar would not? A change
    to a Palmyrene Aramaic font would do the job as well, for Palmyrene
    Aramaic and any of the various Aramaic "scripts" or "styles" just as a
    font change does for historical styles for European scripts if someone
    want to print of display them. In fact such fonts do poorly, just as a
    general black letter medieval font will do poorly for anything but the
    exact manuscript on which it was based, if based on a particular
    manuscript. There are no fonts before modern times, no exactly
    standardized characters, no exactly standardized type styles. Every
    scribe has a different hand. Characters in simple charts of Semitic
    scripts are often deceptive just as charts of forms taken by medieval
    Latin characters in particular "scripts"/"styles" are deceptive, often
    being a choice made by a scholar from many variants.

    Coding Aramaic generally as a single script in Unicode would code all
    the "script" variations. This has already been done by encoding the
    square Aramaic letters in their "modern Hebrew" forms. What more is
    needed for encoding? Similarly Latin has been encoded with modern Latin
    letter forms as the default glyphs and Greek has been encoded with
    modern Greek letter forms as the default glyphs. One might want some
    further final forms and additional punctuation for Aramaic styles (or
    might not). That can be decided. Otherwise, there is nothing much more
    to do, save perhaps add a matrix somewhere showing variant glyphs in
    different Aramaic "scripts"/"styles".

    To take another example, all runic "scripts" have been unified in
    Unicode, though the runic "scripts" vary greatly in the number of
    letters used and in the values of the letters as well as in their
    appearance. There is more *reason* to produce separate encodings for the
    various runic scripts then for northwest Semitic "scripts", though I've
    heard no complaints about the unification of runic "scripts" and I have
    no complaints myself.

    Indeed, there is no *reason* when looking at the values of the
    characters of the Semitic "scripts" related to Phoenician that there
    could not have been a single encoding for the consonants for *all* these
    supposed "scripts" (with separate encodings for the pointings).

    A common Semitic encoding *could* still be added to Unicode, with
    individuals deciding whether or not to use that coding also for Arabic,
    Hebrew and Syriac.

    I am not recommending this.

    I am pointing out how much these scripts are seen to be stylistic
    variants of one another to one who can to some extent read them.

    If one must split them up, charts and scholarly books do provide normal
    divisions of "scripts" or "styles" which correspond to those given by
    Michael Everson at http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf

    All that has been well worked out for the common "scripts". A normal
    division is:

    1.) Proto-Sinaitic and other early pictographs.
    2.) Old Arabic "scripts" (Old South Arabic and Old North Arabic).
    3.) Northwest Semitic (the 22-character abjad including Phoenician
    scripts, descendant Aramaic scripts such as square Aramaic used for
    Hebrew and also including Syriac).
    4.) Arabic (which though descended from Nabatean Aramaic became so
    different that it might be better encoded separately, perhaps to be
    compared to the Aramaic scripts in somewhat the same way as Latin might
    be compared to early Greek scripts).

    The common 22-character Northwest Semitic abjad can be broken down into:

    1.) Phoenician/Canaanite scripts including Paleo-Hebrew and its
    descendant Samaritan and also Paleo-Aramaic.
    2.) Later Aramaic scripts.
    3.) Syriac scripts which differ greatly in appearance from the other
    Aramaic scripts.

    Note: special appearance and pointing for Hebrew and Syriac is really
    the only reason to distinguish these particularly. The letters are the
    same in origin and are more the same in meaning than between Greek
    script and variant Greek script. Greek letters in variant Greek scripts
    however are (generally) far more alike in appearance than the characters
    of the various early northwest Semitic "scripts"/"styles".

    But should a difference in appearance count in a decision to code
    separately within Unicode when *every* other feature of two "scripts" is
    identical, including origin?

    Hebrew scriptures were first written in the Phoenician script (=
    Paleo-Hebrew), then in Aramaic script which developed *very* slightly in
    medieval times to the normal modern Hebrew script. Emerson's division
    would suggest four different scripts ought to be used for coding the
    same texts with the same logical characters with the same names, that
    texts should be encoded as Phoenician or Aramaic or Hebrew or Samaritan
    depending on style, when when letter-by-letter the same.

    Cursive Hebrew still retains for some letter forms the Phoenician shapes
    (which is very strange). Should cursive Hebrew therefore be encoded
    separately?

    I don't see any purpose in encoding these scripts differently in Unicode
    when they represent *exactly* the same abjad with only different styling
    of the characters.

    Michael Everson at http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf could
    only say:

    << Note that Jony Rosenne once suggested that we should not encode
    Phoenician because it is a
    glyph variant of Hebrew. This is not true, despite the one-to-one
    correspondence of character entities. In the Dead Sea Scrolls, for
    instance, where the Tetragrammaton is written with Paleo-Hebrew letters,
    it is (in UCS
    encoding terms) the Phoenician script in which the Name is written. >>

    First, there is not *just* a one-to-one correspondence of character
    entities but also one-to-one correspondence of the characters in respect
    to their origin and names. They *are* the same abjad in all but style.

    Second, if it is argued that the use of Phoenician script for the
    Tetragrammaton in some texts otherwise written in square Aramaic
    characters indicates that Phoenician and square Aramaic characters must
    be encoded separately within Unicode, should not one make the same
    argument for medieval texts with a headline "script" imitating
    traditional Roman square capitals, initial paragraphs in uncial "script"
    and the main text in Carolingian "script" including majuscule and
    miniscule letters?

    If Everson's argument is applied to medieval manuscripts, uncial
    "script" and Carolingian "script" and Roman capitals should be encoded
    separately within Unicode.

    Also, the Tetragrammaton is represented in the English King James
    translation of Hebrew scriptures and in some more recent translations by
    the word LORD and sometimes GOD in which all but the first letter is
    printed in small capitals. Should small capitals therefore be encoded
    separately in Unicode?

    (Note: these small capitals are the small capitals normally used for
    emphasis and usually appear slightly higher than the normal lowercase
    characters lacking ascenders. They are not the same as the lower case
    small capital characters coded in Unicode as phonetic characters which
    properly appear as identical in height to other lower case characters.)

    That characters of one style are used in a text written predominately in
    another style does not indicate that the "script" or "style" to which
    they belong needs to be coded independently. That is what markup is for.

    Peter Kirk has already made this point in part.

    There seems to me *no* reason why most of Aramaic "scripts" should not
    be unified within Unicode with Hebrew and almost *no* reason why
    Phoenician and Samaritan should not be unified.

    And there seems to me *little* reason why Hebrew/Aramaic "scripts" and
    Phoenician/Samaritan "scripts" should not be unified. The two families
    of styles use the same abjad though with differences in appearance too
    great for most of the letters to be seen as the same letters between the
    two families by appearance alone.

    But how much should visual distinction count when it is the *sole*
    difference? It appears to me that this is where dispute lies mostly,
    despite the precedent of the Unicode encoding of runic "scripts".

    There may also be some thinking of HTML/XML/XHTML web display of
    characters where forcing of font is not reliable. One would not want a
    discussion of ancient Phoenician characters to display modern Hebrew
    forms! But this same problem currently applies to runes, medieval Latin
    characters, Han characters and so forth. One shouldn't let the current
    shortcomings of one display method among many dictate Unicode encodings.

    Jim Allan



    This archive was generated by hypermail 2.1.5 : Sat Dec 27 2003 - 01:33:46 EST