Re: Encoded rendering instructions (was Unicode's Mandate)

From: Dean Snyder (
Date: Wed Mar 09 2005 - 12:44:21 CST

  • Next message: Mark Davis: "Re: Sample code for NFC and Plane 1 characters"

    Peter Constable wrote at 9:49 PM on Monday, March 7, 2005:

    >Whenever anyone suggests that something isn't plain text, one can always
    >respond that the question of what is plain text is begged.
    >A: "I want characters for A-Z that flash when displayed."
    >B: "That isn't plain text."
    >A: "Well, that's begging the question..."

    Why stop with this ridiculous straw man example? Why not suggest
    characters that deform based on ambient sounds?

    >Ken's point is that, after years of working with architecting and
    >designing implementations for processing of text, he (and many others)
    >would consider this far better handled as rich or marked-up text rather
    >than plain text.

    Apparently others do not agree with you. That makes this a legitimate
    candidate for discussion which is all I've suggested so far.

    >I quite agree with that judgment. This idea has several complications. I
    >certainly think it's a good idea that paleographers should have a way to
    >convey such information, but not in plain text.

    You give no reason why encoded damage indicators are a bad idea. And you
    offer no definition of plain text.

    >The one thing that
    >resembles this that I *do* think might be appropriate for plain text is
    >the case of a base character that has become illegible due to
    >degradation of the document, while there are marks that combine with
    >that base that are still identifiable. This goes rather beyond that,

    Why is damage to a base character with combining characters plain text,
    but damage to a stand-alone base character is not plain text?

    >Some complications:
    >- You say that one of these rendering modifier characters would proceed
    >the character they modify. How do they interact with combining

    There are at least two ways to handle base plus combining characters with
    damage - as conjointly damaged components or as separately damaged units.
    For my purposes conjointly damaged components would be sufficient.

    >Are the combining characters?

    I suggest that damage indicators be default ignorable, invisible base
    characters; I see no reason for them to be combining characters.

    >Do they have scope over only
    >the immediately adjacent character or over a combining sequence?

    See above. For my purposes, a sequence of damage indicators would apply
    to a combining sequence.

    >If the
    >latter, what happens if a combining sequence has more than one, e.g.
    >< base, R1, mark R2, mark, R3, mark
    >where Rn are various rendering modifiers.

    Rules of precedence ... with damage indicators having highest precedence.

    >- It is not at all obvious how these would interact with scripts in
    >which the character-glyph mapping is complex and non-linear.

    Since the damage, by definition, is based on visibility to humans it
    would track the visible glyphs.

    >- You conceive of this as simply having an effect on the rasterized
    >image, e.g. in terms of alpha-channel transparency. You cannot expect
    >such a thing to get widely implemented -- it would remain something
    >found only in special-purpose applications.

    That's YOUR opinion about one of my original questions, but even if it
    were true, there would still be enough reason to encode these damage

    >This would be the case for a
    >variety of reasons, starting with people not wanting to tinker with code
    >that is widely-deployed

    Bit manipulation of raster images is trivial and ubiquitous, it is not
    "tinkering". For example, all modern OS'es have added anti-aliasing of
    text, which is a much more extensive, invasive, pervasive, and
    complicated "tinkering with code" than what I have suggested.

    >has been fairly stable for some time, is
    >functioning as required without problem for millions of users, but the
    >changes being requested would involve significant rework of the code

    ANY code change is a threat to stability, but the kind of change involved
    with graying bits is minimal in threat - set a bit-field flag when damage
    indicators are encountered in a text stream and at rasterization time
    alter the appropriate pixels in the ink bounding box. This is not a
    threat to stability. Nor is it much work. Plus I dare say it would hardly
    ever be encountered in text streams and therefore not be a hit on performance.

    >and would benefit a very limited group of users.

    It would benefit all generators, processors, and consumers of encoded
    ancient texts.

    >> But the real need is for sometimes very significant historic character
    >> damage to travel everywhere with plain text representations of that
    >The information being conveyed is not part of the core, logical text --
    >the sequence of character identities -- but is a layer of attributes
    >applied to that core text.

    No. Damage indication goes to what actually IS the core text. That's why
    it should have the highest precedence in rendering and why it should be a
    part of encoded plain text.

    >Representing it in terms of markup would make
    >complete sense;
    >on the other hand, it's not clear why there might be a
    >need to represent such information in plain text.

    Markup is essential for sophisticated control of damaged text (although,
    as I've suggested in another post here, the current specifications are
    inadequate); I'm only suggesting that a minimal encoded set of damage
    indicator characters are indispensable for roughly indicating the actual
    content and integrity of damaged plain text passages.


    Dean A. Snyder

    Assistant Research Scholar
    Manager, Digital Hammurabi Project
    Computer Science Department
    Whiting School of Engineering
    218C New Engineering Building
    3400 North Charles Street
    Johns Hopkins University
    Baltimore, Maryland, USA 21218

    office: 410 516-6850
    cell: 717 817-4897

    This archive was generated by hypermail 2.1.5 : Wed Mar 09 2005 - 12:53:59 CST