Re: Encoded rendering instructions (was Unicode's Mandate)

From: Dean Snyder (
Date: Tue Mar 08 2005 - 20:27:03 CST

  • Next message: Dean Snyder: "Re: Encoded rendering instructions (was Unicode's Mandate)"

    Andrew C. West wrote at 4:26 AM on Tuesday, March 8, 2005:

    >On Mon, 07 Mar 2005 22:33:04 -0500, Dean Snyder wrote:
    >> But the real need is for sometimes very significant historic character
    >> damage to travel everywhere with plain text representations of that text.
    >I would disagree with this assertion. For palaeographers and epigraphers the
    >primary reference to a text is not going to be a plain text
    >representation, but
    >a photograph, scan, rubbing or even a hand-drawing. The secondary
    >reference will
    >be a critical transcription or transliteration of the text which
    describes the
    >physical and palaeographical features of the text, and notes any textual
    >and lacunae, and where necessary provides conjectural readings for damaged
    >characters. Plain text can never faithfully represent all the palaeographical
    >details of a text, and no palaeographer in their right mind would attempt
    >to do
    >so. ...
    >Greying quadrants of a glyph is a very blunt instrument, as damage does not
    >necessarily resolve itself into neat squares. In any case, such a
    >mechanism can
    >only give a hint of the actual damage, and other scholars will still have
    >to go
    >back to a facsimile of the original text to confirm the precise nature of the
    >damage and confirm the supplied reading.

    I think you missed my point. As I stated in my original email [emphasis

    "In transcriptions of damaged ancient texts it is important and useful to
    indicate ROUGHLY the extent of damage to a glyph. Of course, this is NOT
    a substitute for direct examination of the original document, but it is a
    useful property for programmatic processing of ancient texts and their
    approximate renderings."

    As a reader of ancient texts I would never research a damaged text based
    on representations in ANY encoded form. The characters I have suggested
    are merely for marking, in perpetuity in plain text, damaged characters
    as damaged, with only a modicum of the extent of damage indicated. These
    characters would presumably be default ignorable and whether they are
    rendered or not the intent is that they be interpreted, by both humans
    and machines, as "Even though I think this damaged character is this, do
    not trust my interpretation; look instead at the original."

    But just that information, completely inadequate as it is for epigraphy,
    is extremely useful for the interchange, preliminary interpretation, and
    machine processing of one-of-a-kind, damaged plain text passages.

    >Furthermore, although greying
    >may work
    >OK on a computer screen (and that's debatable), when you print out the plain
    >text there's a good chance that the greying will not be obvious to the
    >reader. ...

    I'm not so concerned about rendering damaged characters, although that
    will prove to be useful; my main concern is the actual preservation in
    plain text of the indication that characters ARE damaged along with rough
    approximations of that damage.

    >Another problem that I see with Dean's suggestion is that damage often
    >makes it
    >difficult or impossible to be sure just what the damaged character
    >actually is.
    >Dean's greying mechanism presupposes that you can identify the character, and
    >recognise that it is, for example, "A" with obliterated lower right
    corner. ...

    All it would take to solve the unknown character problem is to simply
    encode "UNKNOWN CHARACTER". [at Ux123E4; suggested glyph - a solid black
    rectangle ;-) ]


    By the way, I forgot to mention that we would need at least one more
    rendering instruction character - a RESTORED FROM PARALLEL PASSAGE
    character, which simply means what its name says - this damaged character
    can be restored based on its presence in a parallel passage. (I
    specifically stated "at least one more" character, because the actual
    issue gets complicated by other factors such as the state of preservation
    of the character in the parallel passage and by the locus of the parallel
    passage itself - e.g., is it elsewhere in the same text or in another text?)


    Dean A. Snyder

    Assistant Research Scholar
    Manager, Digital Hammurabi Project
    Computer Science Department
    Whiting School of Engineering
    218C New Engineering Building
    3400 North Charles Street
    Johns Hopkins University
    Baltimore, Maryland, USA 21218

    office: 410 516-6850
    cell: 717 817-4897

    This archive was generated by hypermail 2.1.5 : Wed Mar 09 2005 - 10:41:09 CST