Re: Encoded rendering instructions (was Unicode's Mandate)

From: Dean Snyder (
Date: Thu Mar 10 2005 - 10:36:11 CST

  • Next message: Gregg Reynolds: "Re: Encoded rendering instructions (was Unicode's Mandate)"

    Asmus Freytag wrote at 8:48 PM on Wednesday, March 9, 2005:

    >At 07:36 PM 3/9/2005, Dean Snyder wrote:
    >>I've made it very clear that THE basis for my thinking on encoding damage
    >>indicators is to enable "guaranteed" integrity for damaged, interchanged
    >>plain text.
    >A similar argument could be made for the absolute integrity of mathematical
    >expressions. More people (users) rely on textual representations of
    >expressions in their work than on transcriptions of damaged plain text, and
    >in many cases there are potentially severe consequences if mathematical
    >expressions are inadvertently altered.

    The difference between mathematical notation and my suggested
    application, ancient texts, is that every ancient document is a unique
    autograph and it is important to maintain a minimum of textual integrity
    for all plain text representations of those documents.

    >Nevertheless, nobody asserts that mathematical expressions *must* be in
    >plain text. All users of mathematics agree that some form of convention,
    >going beyond plain text, is needed. The two top contenders, TeX and MathML
    >use very different approaches to the markup, TeX focusing only on defining
    >the visual appearance, MathML focusing on the underlying mathematical

    Math notational systems must support page layout, most definitely the
    purview of markup. I'm only talking about the actual presence or partial
    presence of a character, a much simpler concept that has nothing to do
    with page layout.

    >>My reasoning goes to the core of
    >>what separates text from meta-text. THAT, I believe, is the proper basis
    >>for discussion this discussion, not the merits or demerits of any
    >>particular markup system.
    >And given my comments above, it is not the task of plain text to indicate
    >damage and similar type of information.
    >Some people feel that the choice between plain-text and markup should not
    >be an all-or-nothing proposition.... [Murray Sargent's minimal markup
    >scheme for mathematics.

    There is no doubt in mind about the need for epigraphic meta-textual
    markup; my only suggestion has been the CONCURRENT need for simple plain
    text damage indicators.

    >In this, it is similar to ideographic desciption sequences. If you have
    >software support to display a built-up sequence, the text can act like
    >formatting instructions, if you don't, you get a human-readable, symbolic
    >description language.

    Description sequences are an interesting analogy I had not thought of.

    My initial idea was that the damage indicators should be ignorable and
    just travel with their plain text as invisible characters until some
    software knows how to handle them. It's an interesting idea that they
    should be potentially visible. If that were the case, and given a 3X3
    damage matrix, we have 2 main encoding options:

     1) encode 9 damage characters, the union of which indicates the extent
    of damage
     2) encode 512 damage characters, one for each possible combination of
    damaged areas

    The advantage of the first one is the small number of encoded characters;
    the disadvantage are the ugliness and difficulty of human parsing, more
    complicated programmatic handling, more storage, and slower processing.

    The advantages of number 2 are the simplicity for human parsing, a more
    aesthetic text stream for human readability, easier programmatic
    handling, less storage, and faster processing; the disadvantage is the
    larger number of encoded characters.

    I like the 512 visible damage indicator characters idea.

    >In none of these cases do you have the expectation that *all* software (or
    >even potentially all software) would be required to treat any characters as
    >other than perfectly ordinary graphical characters - although *some*
    >software can choose to follow specific conventions.

    That's the beauty of this model.

    >That is very similar to
    >XML, where the source code is plain text, and the result is something else.
    >But in none of these cases is the information itself encoded in plain text
    >- it's encoded in a convention that uses plain text as a source form.

    You lost me there. Ideographic description sequences are plain text.


    Dean A. Snyder

    Assistant Research Scholar
    Manager, Digital Hammurabi Project
    Computer Science Department
    Whiting School of Engineering
    218C New Engineering Building
    3400 North Charles Street
    Johns Hopkins University
    Baltimore, Maryland, USA 21218

    office: 410 516-6850
    cell: 717 817-4897

    This archive was generated by hypermail 2.1.5 : Thu Mar 10 2005 - 10:55:19 CST