Date: Wed Aug 06 2003 - 17:33:38 EDT
TEMPORARILY in Madison Wisconsin
I am posting this to both regular Unicode and
the new Hebrew list. Please reply off-list or
to the Hebrew list if you want me to see what
you wrote immediately.
I am responding at great length to the Roadmap proposals
for the Semitic dialects Mandaic, Early Aramaic, and
Samaritan. BTW, the larger "phylum" for these dialects
is called Afroasiatic.
I am strongly AGAINST all three proposals.
Samaritan is a Hebrew dialect, still used today in Israel
in worship/liturgy and probably elsewhere in the Middle East,
with a series of different vowel and other marks, many of
them derived from Arabic. Mandaic and all other Aramaics
are "cousin" languages to Hebrew, Arabic, etc....
Today there are probably 10-20 Aramaic dialects in use,
written in 6 scripts, with maybe 200,000 speakers, but not
even one of them is still called "Aramaic."
When the Ideographic Rapporteur Group worked on Chinese etc,
they were fortunate. There had been 200 earlier suggestions
on how to computerize Chinese. There was even a
"Chinese Language Computer Society."
But Afroasiatic----Aramaic, Syriac, Mandaic, Egyptian, Somali,
Hausa, Hebrew, Samaritan, Amorite, Yaudic, Tigrinya, Arabic,
Berber, Moabite, Amorite, Coptic-----has not fared as well
as CJKV. Afroasiatic has been written continuously even
longer than Chinese--5,200 years so far (Ancient Egyptian)---
but with far more writing systems.
Ancient and Modern Afroasiatic combined has the
2ND LARGEST GROUP OF CHARACTERS after CJKV.
An acronym might be EAUSAS -- Egyptian, Akkadian, Ugaritic,
Semitic Alphabetic and Syllabic.
I estimate that EAUSAS has possibly 5,000 characters,
4,600 from the ancient world, maybe 400 in regular use
today. The sticky part is that the 400 characters
used today were used, maybe in another font shape, also
1,600 - 3,200 years ago, depending on the alphabetic or
A fully functional Semitic / Afroasiatic electronic
dictionary or multi-text-database system would use
the entire EAUSAS character set plus text databases of
Hebrew Bible ("Old Testament"), Talmud, Early Aramaic
epigraphy, Mandaic, Quran....There are cognates found
in all Semitic/wider Afroasiatic languages, so a search
engine would have to search each text and each different
script. Aramaic is actually the worst problem, since
it's been written in 8 scripts over the past 3,000 or so
years (1 cuneiform epigraph, 3 Egyptian scripts,
a series of alphabetic scripts including Roman even).
WHY I AM AGAINST THE PROPOSALS
So here's the problem, which seems to me a clear
language engineering situation: there are VOLUMINOUS
amounts of material in Egyptian and Akkadian that could be
computerized. The Hebrew Bible has 1,000 pages of
Hebrew and Aramaic, the Talmud has at least 40,000 pages
of Aramaic and Hebrew. There's also quite a bit of Ugaritic,
a unique alphabet.
But for the Early Aramaic, which can be perfectly
represented in modern "Hebrew" square script, there are maybe
3 pages of mostly tiny scraps of text, if that much. For many
of the scraps the question is: what language is this, actually?--
Aramaic or something else? But you are proposing a
completely unnecessary script for 3 pages of material, and
make an overworked search engine go through those 3 pages
in a different way than the work it does for the
other thousands of pages of Aramaic in the 6 other scripts.
Mandaic is easily represented by Hebrew + one extra letter.
There is more material here, but there is no problem in
seeing it as a variant font.
Samaritan is a Hebrew font variant with interesting different
sets of vowel points. There's no reason to computerize it
separately, despite the exotic shapes.
There are legitimate reasons to computerize Akkadian,
Ugaritic, Egyptian etc. separately---their writing systems
form unique subsets within Afroasiatic.
In total, the early Semitic alphabetic material cannot be
more than 12 pages--counting Moabite, Amorite, Yaudic,
Every scrap of early alphabetic Semitic material has different
letter shapes. It never did become anything like a standard.
Epigraphers (people who read the tiny scraps) spend several years
learning to recognize this material as they get their Ph.D.s
at Harvard, Univ Chicago, .....It's very difficult precisely
because no one writes the letters the same twice.
Even the "Hebrew" of Unicode did not become "standardized" until
maybe 1800 years ago----and the standardization period lasted
from 500 B.C. - 300 A.D. At the time of Qumran, some letter shapes
are very familiar and others are still weird. During the earlier
periods nothing was ever standardized.
This archive was generated by hypermail 2.1.5 : Wed Aug 06 2003 - 18:20:19 EDT