From: Dean Snyder (dean.snyder@jhu.edu)
Date: Wed Mar 09 2005 - 12:44:21 CST
Peter Constable wrote at 9:49 PM on Monday, March 7, 2005:
>Whenever anyone suggests that something isn't plain text, one can always
>respond that the question of what is plain text is begged.
>
>A: "I want characters for A-Z that flash when displayed."
>B: "That isn't plain text."
>A: "Well, that's begging the question..."
Why stop with this ridiculous straw man example? Why not suggest
characters that deform based on ambient sounds?
>Ken's point is that, after years of working with architecting and
>designing implementations for processing of text, he (and many others)
>would consider this far better handled as rich or marked-up text rather
>than plain text.
Apparently others do not agree with you. That makes this a legitimate
candidate for discussion which is all I've suggested so far.
>I quite agree with that judgment. This idea has several complications. I
>certainly think it's a good idea that paleographers should have a way to
>convey such information, but not in plain text.
You give no reason why encoded damage indicators are a bad idea. And you
offer no definition of plain text.
>The one thing that
>resembles this that I *do* think might be appropriate for plain text is
>the case of a base character that has become illegible due to
>degradation of the document, while there are marks that combine with
>that base that are still identifiable. This goes rather beyond that,
>though.
Why is damage to a base character with combining characters plain text,
but damage to a stand-alone base character is not plain text?
>Some complications:
>
>- You say that one of these rendering modifier characters would proceed
>the character they modify. How do they interact with combining
>sequences?
There are at least two ways to handle base plus combining characters with
damage - as conjointly damaged components or as separately damaged units.
For my purposes conjointly damaged components would be sufficient.
>Are the combining characters?
I suggest that damage indicators be default ignorable, invisible base
characters; I see no reason for them to be combining characters.
>Do they have scope over only
>the immediately adjacent character or over a combining sequence?
See above. For my purposes, a sequence of damage indicators would apply
to a combining sequence.
>If the
>latter, what happens if a combining sequence has more than one, e.g.
>< base, R1, mark R2, mark, R3, mark
>where Rn are various rendering modifiers.
Rules of precedence ... with damage indicators having highest precedence.
>- It is not at all obvious how these would interact with scripts in
>which the character-glyph mapping is complex and non-linear.
Since the damage, by definition, is based on visibility to humans it
would track the visible glyphs.
>- You conceive of this as simply having an effect on the rasterized
>image, e.g. in terms of alpha-channel transparency. You cannot expect
>such a thing to get widely implemented -- it would remain something
>found only in special-purpose applications.
That's YOUR opinion about one of my original questions, but even if it
were true, there would still be enough reason to encode these damage
indicators.
>This would be the case for a
>variety of reasons, starting with people not wanting to tinker with code
>that is widely-deployed
Bit manipulation of raster images is trivial and ubiquitous, it is not
"tinkering". For example, all modern OS'es have added anti-aliasing of
text, which is a much more extensive, invasive, pervasive, and
complicated "tinkering with code" than what I have suggested.
>has been fairly stable for some time, is
>functioning as required without problem for millions of users, but the
>changes being requested would involve significant rework of the code
ANY code change is a threat to stability, but the kind of change involved
with graying bits is minimal in threat - set a bit-field flag when damage
indicators are encountered in a text stream and at rasterization time
alter the appropriate pixels in the ink bounding box. This is not a
threat to stability. Nor is it much work. Plus I dare say it would hardly
ever be encountered in text streams and therefore not be a hit on performance.
>and would benefit a very limited group of users.
It would benefit all generators, processors, and consumers of encoded
ancient texts.
>> But the real need is for sometimes very significant historic character
>> damage to travel everywhere with plain text representations of that
>text.
>
>The information being conveyed is not part of the core, logical text --
>the sequence of character identities -- but is a layer of attributes
>applied to that core text.
No. Damage indication goes to what actually IS the core text. That's why
it should have the highest precedence in rendering and why it should be a
part of encoded plain text.
>Representing it in terms of markup would make
>complete sense;
>on the other hand, it's not clear why there might be a
>need to represent such information in plain text.
Markup is essential for sophisticated control of damaged text (although,
as I've suggested in another post here, the current specifications are
inadequate); I'm only suggesting that a minimal encoded set of damage
indicator characters are indispensable for roughly indicating the actual
content and integrity of damaged plain text passages.
Respectfully,
Dean A. Snyder
Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218
office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi/
http://users.adelphia.net/~deansnyder/
This archive was generated by hypermail 2.1.5 : Wed Mar 09 2005 - 12:53:59 CST