From: Dean Snyder (dean.snyder@jhu.edu)
Date: Thu Mar 10 2005 - 10:36:11 CST
Asmus Freytag wrote at 8:48 PM on Wednesday, March 9, 2005:
>At 07:36 PM 3/9/2005, Dean Snyder wrote:
>>I've made it very clear that THE basis for my thinking on encoding damage
>>indicators is to enable "guaranteed" integrity for damaged, interchanged
>>plain text.
>
>A similar argument could be made for the absolute integrity of mathematical
>expressions. More people (users) rely on textual representations of
>mathematical
>expressions in their work than on transcriptions of damaged plain text, and
>in many cases there are potentially severe consequences if mathematical
>expressions are inadvertently altered.
The difference between mathematical notation and my suggested
application, ancient texts, is that every ancient document is a unique
autograph and it is important to maintain a minimum of textual integrity
for all plain text representations of those documents.
>Nevertheless, nobody asserts that mathematical expressions *must* be in
>plain text. All users of mathematics agree that some form of convention,
>going beyond plain text, is needed. The two top contenders, TeX and MathML
>use very different approaches to the markup, TeX focusing only on defining
>the visual appearance, MathML focusing on the underlying mathematical
>structure.
Math notational systems must support page layout, most definitely the
purview of markup. I'm only talking about the actual presence or partial
presence of a character, a much simpler concept that has nothing to do
with page layout.
>>My reasoning goes to the core of
>>what separates text from meta-text. THAT, I believe, is the proper basis
>>for discussion this discussion, not the merits or demerits of any
>>particular markup system.
>
>And given my comments above, it is not the task of plain text to indicate
>damage and similar type of information.
>
>Some people feel that the choice between plain-text and markup should not
>be an all-or-nothing proposition.... [Murray Sargent's minimal markup
>scheme for mathematics.
There is no doubt in mind about the need for epigraphic meta-textual
markup; my only suggestion has been the CONCURRENT need for simple plain
text damage indicators.
>In this, it is similar to ideographic desciption sequences. If you have
>software support to display a built-up sequence, the text can act like
>formatting instructions, if you don't, you get a human-readable, symbolic
>description language.
Description sequences are an interesting analogy I had not thought of.
My initial idea was that the damage indicators should be ignorable and
just travel with their plain text as invisible characters until some
software knows how to handle them. It's an interesting idea that they
should be potentially visible. If that were the case, and given a 3X3
damage matrix, we have 2 main encoding options:
1) encode 9 damage characters, the union of which indicates the extent
of damage
2) encode 512 damage characters, one for each possible combination of
damaged areas
The advantage of the first one is the small number of encoded characters;
the disadvantage are the ugliness and difficulty of human parsing, more
complicated programmatic handling, more storage, and slower processing.
The advantages of number 2 are the simplicity for human parsing, a more
aesthetic text stream for human readability, easier programmatic
handling, less storage, and faster processing; the disadvantage is the
larger number of encoded characters.
I like the 512 visible damage indicator characters idea.
>In none of these cases do you have the expectation that *all* software (or
>even potentially all software) would be required to treat any characters as
>other than perfectly ordinary graphical characters - although *some*
>software can choose to follow specific conventions.
That's the beauty of this model.
>That is very similar to
>XML, where the source code is plain text, and the result is something else.
>But in none of these cases is the information itself encoded in plain text
>- it's encoded in a convention that uses plain text as a source form.
You lost me there. Ideographic description sequences are plain text.
Respectfully,
Dean A. Snyder
Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218
office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi/
http://users.adelphia.net/~deansnyder/
This archive was generated by hypermail 2.1.5 : Thu Mar 10 2005 - 10:55:19 CST