Re: New contribution

From: Dean Snyder (dean.snyder@jhu.edu)
Date: Wed Apr 28 2004 - 23:01:59 EDT

  • Next message: Dean Snyder: "RE: PUA as the Wild West [was: SSP default ignorable characters]"

    Kenneth Whistler wrote at 6:15 PM on Wednesday, April 28, 2004:

    >Dean,
    >
    >> Then why were Chinese, Japanese, and Korean unified?
    >
    >Please refer to TUS 4.0, pp. 296-303, and, in particular, Table 11-2.

    >> I'm really not
    >> trying to open a can of worms here,
    >
    >Yes you are.

    Actually, no I'm not. I think CJK is a very apropos example - unification
    in the face of strong (and very politically active) opposition. If THOSE
    diascripts could be unified, then why shouldn't Canaanite be unified, all
    the more because unification not only has strong SUPPORT from its user
    community, but the user community is, in fact, already treating it as
    though it were unified.

    >> but what explicitly are the triggers
    >> for script unification in Unicode? If these were clearly spelled out the
    >> decision should be simple for Phoenician/Hebrew. But if different
    >> criteria are applied in different scenarios then every situation will
    >> generate the kind of ongoing discussion we have had about Phoenician/
    Hebrew.
    >
    >The latter will pertain, because each situation *is* different, and
    >the easy cases have almost all been dealt with already.
    >
    >The only slam dunks regarding script identity left tend to be the
    >con-scripts, and those are problematical for encoding for other
    >reasons than their identity as scripts.

    I am more a philologist of ancient languages; I am looking to encoding
    experts for input on precisely what they think are the sorts of criteria
    that should trigger dis-unification. Of course, this is, in fact, an
    iterative process of dialog with each side contributing expertise. But I
    would expect you, the encoders, many of whom I suspect are also linguists
    and philologists, to proffer the first round of suggestive dis-
    unification criteria. You've had the experience in this (which is why I
    mentioned CJK as an example). It looks like, though, that you are saying
    there are none. At least you don't mention any. I'm actually just
    surprised that it appears to be so ad hoc. Maybe someone should write a
    white paper.

    >You are well aware of the ongoing controversies regarding the
    >exact historical boundaries of the Sumero-Akkadian cuneiform
    >script, for example.

    Actually, I'm not aware of any such controversies. In fact, this example
    argues strongly for the unification of Canaanite alphabetic script.

    Michael Everson is one of the authors of the proposal to encode 2400
    years of cuneiform in one unified encoding. There is far greater
    disparity between URIII Sumerian and Neo-Babylonian embodied in that
    proposed single encoding than there is between Old Phoenician and Modern
    Israeli Hebrew script. Where's the consistency? Where's the pattern here
    for us to follow?

    >There *are* no axiomatic principles of
    >script identity which can be applied across the board to decide
    >that and all other instances of historical boundaries for a
    >candidate script to have its repertoire of characters separately
    >encoded in Unicode.

    I'm not asking for self-evident principles - I'm asking for TRIED and
    PROVEN principles, or maybe just guidelines, but at least explicit ones.
    If I would expect it from anyone on earth it would be from you encoding jocks.
     
    ...
    >> >To unify Hebrew and Phoenician scripts would be ahistorical at best.
    >> >A silly unification.
    >>
    >> Not anywhere near as "silly" as CJK unification.
    >
    >The unification of Han characters was not "silly".

    I agree; nor, likewise, is the unification of Hebrew and Phoenician silly.

    >> The Canaanite script is
    >> a script continuum spread across Phoenician, Punic, Aramaic, Hebrew,
    >> Moabite, and Ammonite communities, all sharing a common script origin
    >> with each developing independently, some more and some less, over the
    >> centuries. Where one dips ones finger in this stream of continuity and
    >> pronounces "script dis-unification!" is not an easy thing to do.
    >
    >Of course, but not in principle different from attempting the same
    >for Greek, Latin, Old Italic, Alpine, and Gothic, for example.

    Except, perhaps, in having more clear-cut distinctions than in Canaanite.

    >> >I am
    >> >actually astonished to see it suggested that it should be unified
    >> >with Hebrew.
    >>
    >> I suggest that this is only because you are not an actual reader of
    >> ancient Canaanite/Aramaic/Hebrew texts.
    >
    >As long as we are arguing ad hominem, I might add that you suggest
    >that CJK unification was silly only because you are not an actual
    >reader of ancient Chinese, Japanese, and Korean texts.
    >
    >(Try your argument on for size.)

    Well, for one thing, I have never made, and can't imagine making, a CJK
    proposal.

    But more importantly, you miss my intent - from what I do know about it,
    I do not think CJK unification was silly. (My point was that, if you call
    Hebrew/Phoenican a silly unification, a fortiori, CJK is silly.)

    >> I'm not so sure. But at any rate, you are comparing the endpoints of this
    >> script continuum (Phoenician and modern Hebrew). Before you proceed here,
    >> you'd better decide what criteria you will use to separate out scripts in
    >> this script continuum or we will be right back here having the same
    >> discussions over and over again with people who want to distinguish
    >> between Moabite, Ammonite, Old Aramaic, Imperial Aramaic, Punic, ... in
    >> plain text.
    >
    >Correct. There clearly needs to be consensus among the likely users
    >of a script encoded in Unicode that the repertoire and its encoding
    >actually meets some demonstrable need for text representation. If
    >it does not, then we can skip encoding it and get on with the
    >encoding of something like Tifinagh, for which there are official
    >standards bodies of governments clamoring for encoding for
    >particular repertoires. I suspect I know where the UTC priority
    >will lie if it comes to push or shove between those two scenarios.
    >
    >For the Aramaic script continuum there are two potential easy
    >answers:
    >
    >1. Hebrew is already encoded, so just use Hebrew letters for
    >everything and change fonts for every historical variety.

    Which, along with transliteration, is precisely what the Phoenician
    scholarly community is doing now.

    >2. Encode a separate repertoire for each stylistically distinct
    >abjad ever recorded in the history of Aramaic studies, from
    >Proto-Canaanite to modern Hebrew (and toss in cursive Hebrew, for
    >that matter), starting with Tables 5.1, 5.3, 5.4, and 5.5 of
    >Daniels and Bright and adding whatever you wish to that.

    There is so much fluidity in such competing classifications that freezing
    any one of them into several standard encodings would cause much data and
    software distress.

    >But the *correct* answer is likely to be the hard one that carves
    >up that continuum into some useful small set of repertoires to
    >be encoded as separate "scripts" and identifies each of the
    >abjad varieties to be associated with each respective "script",
    >so that extant texts can be correctly encoded in an
    >interoperable way.

    This is what I think should happen. And the operative word here is "hard".

    >> I'm not saying we shouldn't encode the "landmarks" in the Canaanite
    >> script continuum;
    >
    >You aren't? Good. Then instead of objecting on generic grounds
    >to the Phoenician proposal, answer the following question:
    >
    >A. Does Phoenician constitute a "landmark" in the Canaanite
    > script continuum? Yes/No

    It depends ;-)

    As I've stated on the Hebrew list, my reticence to the proposal is based
    on two factors:

    1) The script is wrongly called "Phoenician" - the same script was used
    for Old Phoenician, Old Aramaic, Old Hebrew, Moabite, Ammonite, and
    Edomite. That is why I propose it be named "[Old] Canaanite".

    2) Discussions of this proposal have always been closely linked with
    proposals to encode Aramaic and Samaritan. And this is where we step into
    really turbulent waters (to keep my metaphor alive). My only suggestion
    has been that we slow down, do not proceed precipitously, and get more
    scholarly input.

    >And once you answer that question, perhaps you can contribute to
    >a specification of what the rest of the list of appropriate "landmarks"
    >consists of.
    >
    >> I'm only saying that expert opinion is needed in
    >> determining just what those landmarks are,
    >
    >Absolutely. Please provide your expert opinion.

    Expert is a strong word. I am a student of these languages and scripts
    and I have provided my opinions on the Hebrew list. I am only raising
    caution flags and saying we need more expert opinions before proceeding.

    >> based on some set of agreed
    >> upon criteria.
    >
    >But do not expect a set of axioms to be provided to you that will
    >make answering the questions easy.

    Well, that's what I was hoping you all could provide. I'm still not so
    sure that there is not at least some collective wisdom on this matter in
    this group here.

    >The very nature of the problem requires reaching a consensus
    >among users of the proposed text encoding regarding what
    >text representation purpose it is intended for, and within
    >that context, what the useful boundaries of encoding would
    >be. That is an *operational* definitional problem, not an
    >*axiomatic* one.

    I've never asked for axioms.

    Such operational problems can be spelled out explicitly - no?

    >The question to ask is: Does this proposed identification of
    >script *make sense* for the text representational use it is
    >intended for?

    Ken, you really do have a good way of getting to the core of a problem
    sometimes; and this is one of those times.

    The problem is that what we do with Phoenician now impacts what we will
    do with the other related "scripts" later.

    As I've indicated here and elsewhere, I do believe that an Old Canaanite
    script encoding is in general a good idea; what I'm very cautious about
    is the Aramaic/Samaritan baggage usually, and so casually and
    insistently, associated with it. I think we need to look judiciously at
    the whole spectrum first before we start chopping it up prematurely.

    >The question not to ask is: Where is the set of criteria
    >whereby I can determine whether "Unicode" considers Ammonite
    >and Moabite to be the same script or not?

    Well, frankly we need more than nothing from you (plural).

    >P.S. In case anyone should wish to misconstrue by position, I
    >am *not* an expert on Aramaic, and I do *not* have a preconceived
    >opinion about whether Phoenician should or should not be
    >encoded as a distinct script, and if encoded as a distinct script,
    >what other varieties of Aramaic script it would be considered
    >distinct from. Neither I nor the company I represent has a
    >burning need for this encoding, so I am depending on expert
    >testimony from those who do have such needs to inform me regarding
    >what the best way forward would be when this actually comes up
    >to the UTC for decisions.

    I'll try to find the time to enlist more of this expert advice.

    Respectfully,

    Dean A. Snyder

    Assistant Research Scholar
    Manager, Digital Hammurabi Project
    Computer Science Department
    Whiting School of Engineering
    218C New Engineering Building
    3400 North Charles Street
    Johns Hopkins University
    Baltimore, Maryland, USA 21218

    office: 410 516-6850
    cell: 717 817-4897
    www.jhu.edu/digitalhammurabi



    This archive was generated by hypermail 2.1.5 : Wed Apr 28 2004 - 23:43:19 EDT