Re: Encoded rendering instructions (was Unicode's Mandate)

From: Andrew C. West (
Date: Tue Mar 08 2005 - 06:26:37 CST

  • Next message: Andrew C. West: "Re: Encoded rendering instructions (was Unicode's Mandate)"

    On Mon, 07 Mar 2005 22:33:04 -0500, Dean Snyder wrote:
    > Kenneth Whistler wrote at 5:50 PM on Monday, March 7, 2005:
    > >What *could* be appropriate for encoding as characters, from the
    > >fields of paleography and epigraphy here, would be entire symbols
    > >indicating quadrant damage -- in other words, some thematic take on
    > >sets of quadrant symbols such as U+2596..U+259F, U+25E7, U+25E8,
    > >U+25F0..U+25F3, etc, which might reflect use in text to *discuss*
    > >glyph damage and lacunae, etc. This would be quite different from
    > >trying to encode a bunch of format controls to actually make
    > >the text *render* with damage and lacunae.
    > But the real need is for sometimes very significant historic character
    > damage to travel everywhere with plain text representations of that text.

    I would disagree with this assertion. For palaeographers and epigraphers the
    primary reference to a text is not going to be a plain text representation, but
    a photograph, scan, rubbing or even a hand-drawing. The secondary reference will
    be a critical transcription or transliteration of the text which describes the
    physical and palaeographical features of the text, and notes any textual damage
    and lacunae, and where necessary provides conjectural readings for damaged
    characters. Plain text can never faithfully represent all the palaeographical
    details of a text, and no palaeographer in their right mind would attempt to do
    so. When I was examining 13th century tombstones with 'Phags-pa inscriptions in
    Quanzhou (Marco Polo's Zayton) recently, I took digital photos and made drawings
    and notes in my notebook, and back home, when I transcribed the text into
    Unicode on my computer I wrote it in an HTML page, and added numbered
    annotations about individual characters where necessary. At no point did it
    occur to me to try to represent the text in unannotated plain text format ...
    and if I had, what use would it have been to anyone ?

    Greying quadrants of a glyph is a very blunt instrument, as damage does not
    necessarily resolve itself into neat squares. In any case, such a mechanism can
    only give a hint of the actual damage, and other scholars will still have to go
    back to a facsimile of the original text to confirm the precise nature of the
    damage and confirm the supplied reading. Furthermore, although greying may work
    OK on a computer screen (and that's debatable), when you print out the plain
    text there's a good chance that the greying will not be obvious to the reader. I
    agree with Ken and Doug that a more visible means of identifying damage, such as
    special IDC-like characters, would be much more appropriate, and would avoid all
    the implementation issues associated with localised greying of parts of
    individual glyphs.

    Another problem that I see with Dean's suggestion is that damage often makes it
    difficult or impossible to be sure just what the damaged character actually is.
    Dean's greying mechanism presupposes that you can identify the character, and
    recognise that it is, for example, "A" with obliterated lower right corner. In
    very many cases you can recognise that there is half a character, but just which
    character it is may be impossible to tell from the unobliterated portion. Again,
    using IDC-like graphic characters could deal with this situation simply and

    A mechanism that I have seen in scholarly editions of ancient Chinese
    manuscripts to indicate partially preserved ideographs is to use a hollow
    rectangle representing the obliterated/missing part of the ideograph next to the
    preserved part of the ideograph (often a radical or phonetic element).

    I might add that when, a couple of years ago, I made a proposal to add a
    character to specifically represent a missing or obscured CJK ideograph (a
    hollow square, which is very widely used in typeset editions of Chinese texts to
    represent a missing ideograph -- see <> if
    you're interested), it was rejected by the UTC on the grounds that the proposed
    character could be represented by U+25A1 [WHITE SQUARE]. If this, the most
    ubiquitous of textual symbols is not deemed necessary to encode separately, then
    I do wonder whether there is any chance of getting a set of more controversial
    textual symbols accepted.


    This archive was generated by hypermail 2.1.5 : Tue Mar 08 2005 - 06:28:07 CST