Cuneiform - Dynamic vs. Static

From: Dean Snyder (dean.snyder@jhu.edu)
Date: Tue Jan 13 2004 - 16:23:32 EST

  • Next message: Philippe Verdy: "Re: German characters not correct in output webform"

    Two basic models for encoding cuneiform have been discussed - dynamic and
    static.

      * The dynamic model would encode approximately 300 base (or "simple",
    or "primitive") cuneiform characters along with 14 character modifiers in
    a system that would allow cuneiformists to dynamically create "all"
    cuneiform signs.

      * The static model would hard code "all" of the approximately 1000
    cuneiform signs, to include base signs, base signs that have been
    modified, and base signs that have other base signs embedded within them.

    The differences between the two systems are roughly analogous to the
    differences between encoding the character "A" and the character "ACUTE
    ACCENT" as separate code points versus encoding "A WITH ACUTE ACCENT" as
    a single code point. The dynamic model is more elegant and extensible,
    but more complex; the static model is more brute force and fixed, yet simpler.

    Cuneiform as a script system was dynamic in its early periods; the
    scribes productively introduced new signs with new meanings by applying
    various standardized modifications to base signs. For one example, see
    the graphic of the LU2, "human being", sign with several of its
    modifications attached to this eamil. For several more representative
    examples, see <http://www.jhu.edu/ice/basesigns/>, and for a "complete"
    repertoire of the base and modified signs see the 1.3 mb PDF file at
    <http://www.jhu.edu/ice/basesigns/baseAndModifiedCuneiformSigns.pdf>.
    (All images are from screen shots of Steve Tinney's Classic Cuneiform font.)

    Recently I proposed we re-think the decision made at the Initiative for
    Cuneiform Encoding conferences to statically encode cuneiform. The
    reaction has been mixed, but I consider only 2 of the objections as
    material. (I have appended to this email excerpts from the various
    reactions along with some of my responses.):

      (OBJECTION) The dynamic model is too fragile; unencoded glyphs will be
    "hidden" in the Private Use Area or in OpenType glyph tables.
      (RESPONSE) Not being a font designer, I called a font designer friend
    of mine and he DID say there are tool problems and operating system
    problems associated with non-code-point-specified glyphs in OpenType. He
    specifically mentioned Volt and FontLab. For what it's worth, I have seen
    a difference between Jaguar and Panther in how Mac OS X treats characters
    in the PUA - in Panther they commonly show up with the indeterminate
    glyph symbol even when a suitable font, that worked in Jaguar, is installed.

      (OBJECTION) The dynamic model is too complex; it will require a
    specified syntax.
      (RESPONSE) Yes, we will need to specify a syntax and associated
    properties for the modifier characters, namely what the permissible
    character sequences are and how the modifier characters react with the
    base characters and with one another.

    I have identified 14 cuneiform modifiers as candidates for encoding and I
    divide them into 3 major sub-groups:

    DECORATORS
      Gunu - parallel, small wedges added to a base sign
      Sheshig - parallel "winkelhakens" added to a base sign
      Nutillu - wedges deleted from a base sign
      Curved - curvature added to the base sign (used only for numbers)

    ORIENTERS
      Tenu - slant a sign 45 degrees clockwise
      Inverse - flip a sign 180 degrees verticslly
      Reverse - flip a sign 180 degrees horizontally

    POSITIONERS
      Infix - embed one sign in another
      Affix - place one sign after an infixed sign
      Cross - cross two of the same signs
      Oppose - oppose two of the same signs
      Square - arrange four of the same signs in a cross
      Superpose - place one sign over another
      Postfix - place one sign after another (making a compound sign)

    I will suggest syntax rules for these modifiers in a subsequent email. In
    the meantime, I would appreciate any technical feedback to the issues
    presented here. (For instance, as an example of something I haven't
    discussed here, how should markup affect our decision?)

    Respectfully,

    Dean A. Snyder

    Assistant Research Scholar
    Manager, Digital Hammurabi Project
    Computer Science Department
    Whiting School of Engineering
    218C New Engineering Building
    3400 North Charles Street
    Johns Hopkins University
    Baltimore, Maryland, USA 21218

    office: 410 516-6850
    www.jhu.edu/digitalhammurabi

    ---------------------------------

    APPENDIX A
    SUMMARY OF RESPONSES TO A PROPOSAL TO DYNAMICALLY ENCODE CUNEIFORM

    FOR A DYNAMIC MODEL

      1 It's more powerful.

      2 It fits best with the Unicode model.

      3 It mirrors the way the actual script works.

      4 It allows for new signs without new encodings.

    I agree with all these assessments.

    AGAINST A DYNAMIC MODEL
    (My responses follow each in parenthesis)

      1 It's too much work to do a dynamic model proposal now. (Actually,
    it's not. We already have all the modifiers identified and all the base
    characters both identified and designed into two existing fonts. We only
    need to specify the syntax and properties for the modifiers.)

      2 It's too late to change to a dynamic model. (No, it's not. See # 1 above.)
      
      3 There won't be that many newly discovered signs. (That's an educated
    guess, not a fact.)

      4 One model is as good as the other, so let's just stick with what we
    have. (One model is NOT as good as the other, not if one accepts the
    Unicode model.)

      5 Complex sign shapes can be context-bound and the dynamic model makes
    this more difficult to capture. (This is a font issue.)

      6 Dynamic glyphs are much more difficult to render - it was rejected
    for CJK. (Then perhaps we should throw out Unicode IPA and Devanagari and
    Hebrew, ... and encode all the possible character combinations as
    separate code points? For example, how is CUNEIFORM SIGN LU2 + CUNEIFORM
    POSITIONER INFIX + CUNEIFORM SIGN ESH2 + CUNEIFORM DECORATOR TENU any
    more complex to render than say HEBREW LETTER DALET + HEBREW POINT DAGESH
    + HEBREW POINT QAMATS? Cuneiform glyph formation is nowhere near as
    complex as CJK's.)

      7 The dynamic model is too fragile; unencoded glyphs will be "hidden"
    in the Private Use Area or in OpenType glyph tables. (Now, we are moving
    into real objections. Not being a font designer, I called a font designer
    friend of mine and he DID say there are tool problems and operating
    system problems associated with non-code-point-specified glyphs in OpenType.)

      8 Dynamic is too complex - it will require a specified syntax. (Yes, we
    will need to specify a syntax and associated properties for the modifier
    characters, namely what the permissible character sequences are and how
    the modifier characters react with the base characters and with one another.)

    -----------------------------
    APPENDIX B

    EXCERPTS FROM RESPONSES TO A PROPOSAL TO DYNAMICALLY ENCODE CUNEIFORM
    ARRANGED CHRONOLOGICALLY

    [I know it is always risky to pull quotes out but I have attempted to
    fairly represent the authors' intentions. Full responses can, of course,
    be supplied.]

    Patrick Andries: "I believe it is a more powerful system (an open one)
    but it will depend on those fonts and keyboards being developped."

    Rick McGowan: "Bringing up a fundamental model issue like this again at
    this stage (6 weeks after the current proposal was presented to UTC)
    could potentially derail the cuneiform encoding process indefinitely."

    Michael Everson (an author of the static proposal): "Out of the question.
    We have accepted a different model."

    Christopher Fynn: "This fits in best with the Unicode character encoding
    model and is definitely the way to go, particularly if the script was
    productive. ... I think it is always a good idea to closely mirror in
    encoding the way a script system actually works - and break it down into
    primitives or base characters, combining marks and modifiers."

    William Overington: "The Unicode encoding of cuneiform needs, in my
    opinion, to be encoded to last. Each displayable glyph needs a formal
    Unicode code point, otherwise the glyphs will end up either encoded using
    Private Use Area code points all over the place or else being hidden away
    in glyph tables within Opentype fonts."

    Steve Tinney (an author of the static proposal): "This approach has a lot
    to commend it, and I came to ICE1 with this suggestion. There was
    substantial discussion of the pros and cons, and I ended up feeling that
    encoding the complex signs as characters was as good a way to go as any.
    I would not advocate changing that decision."

    Lloyd Anderson: "Like several of us (including Feuerherm, Tinney,
    recently Dean), I have considered the possibility of encoding containers
    x contents. I wavered in favor of it at some points, but not now. ... The
    greatest advantage of an encoding as [container x contained]
    would be that it accomodates additional signs, and no change to the
    default (binary or default sorting tables) would be necessary to
    accomodate them. There are *many* such additional signs (contra Tinney..."

    Karljuergen Feuerherm (an author of the static proposal): "As ought to be
    clear from my many postings on the subject over the last n years, I have
    generally favoured encoding at low level and using combinations of some
    kind to describe the more complex items, and have advocated this time and
    again. To be honest, I have never really felt that the pros and cons were
    thoroughly thought through, and this has been a disappointment to me.
    However, at this point, I am not at all interested in reopening any past
    arguments of any kind. We've got a preliminary proposal in which has
    taken a certain direction, and we must, for pragmatic reasons, maintain
    it. ... As much as I hate to say it, a mediocre functional encoding is
    still better than no encoding at all."

    Michael Everson: "Whatever the merits of one possible encoding over
    another may be in theory, it should be remembered that one of the reasons
    the static-glyph model was preferred over the dynamic-glyph model is that
    it is far easier to render. It would be possible to encode Chinese
    characters with dynamic fiddly bits which would interact with other base
    characters. But it'd be a font nightmare. There's no payoff."



    LU2.gif

    This archive was generated by hypermail 2.1.5 : Tue Jan 13 2004 - 17:09:51 EST