From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Tue Mar 08 2005 - 06:26:37 CST
On Mon, 07 Mar 2005 22:33:04 -0500, Dean Snyder wrote:
>
> Kenneth Whistler wrote at 5:50 PM on Monday, March 7, 2005:
>
> >What *could* be appropriate for encoding as characters, from the
> >fields of paleography and epigraphy here, would be entire symbols
> >indicating quadrant damage -- in other words, some thematic take on
> >sets of quadrant symbols such as U+2596..U+259F, U+25E7, U+25E8,
> >U+25F0..U+25F3, etc, which might reflect use in text to *discuss*
> >glyph damage and lacunae, etc. This would be quite different from
> >trying to encode a bunch of format controls to actually make
> >the text *render* with damage and lacunae.
>
> But the real need is for sometimes very significant historic character
> damage to travel everywhere with plain text representations of that text.
>
I would disagree with this assertion. For palaeographers and epigraphers the
primary reference to a text is not going to be a plain text representation, but
a photograph, scan, rubbing or even a hand-drawing. The secondary reference will
be a critical transcription or transliteration of the text which describes the
physical and palaeographical features of the text, and notes any textual damage
and lacunae, and where necessary provides conjectural readings for damaged
characters. Plain text can never faithfully represent all the palaeographical
details of a text, and no palaeographer in their right mind would attempt to do
so. When I was examining 13th century tombstones with 'Phags-pa inscriptions in
Quanzhou (Marco Polo's Zayton) recently, I took digital photos and made drawings
and notes in my notebook, and back home, when I transcribed the text into
Unicode on my computer I wrote it in an HTML page, and added numbered
annotations about individual characters where necessary. At no point did it
occur to me to try to represent the text in unannotated plain text format ...
and if I had, what use would it have been to anyone ?
Greying quadrants of a glyph is a very blunt instrument, as damage does not
necessarily resolve itself into neat squares. In any case, such a mechanism can
only give a hint of the actual damage, and other scholars will still have to go
back to a facsimile of the original text to confirm the precise nature of the
damage and confirm the supplied reading. Furthermore, although greying may work
OK on a computer screen (and that's debatable), when you print out the plain
text there's a good chance that the greying will not be obvious to the reader. I
agree with Ken and Doug that a more visible means of identifying damage, such as
special IDC-like characters, would be much more appropriate, and would avoid all
the implementation issues associated with localised greying of parts of
individual glyphs.
Another problem that I see with Dean's suggestion is that damage often makes it
difficult or impossible to be sure just what the damaged character actually is.
Dean's greying mechanism presupposes that you can identify the character, and
recognise that it is, for example, "A" with obliterated lower right corner. In
very many cases you can recognise that there is half a character, but just which
character it is may be impossible to tell from the unobliterated portion. Again,
using IDC-like graphic characters could deal with this situation simply and
effectively.
A mechanism that I have seen in scholarly editions of ancient Chinese
manuscripts to indicate partially preserved ideographs is to use a hollow
rectangle representing the obliterated/missing part of the ideograph next to the
preserved part of the ideograph (often a radical or phonetic element).
I might add that when, a couple of years ago, I made a proposal to add a
character to specifically represent a missing or obscured CJK ideograph (a
hollow square, which is very widely used in typeset editions of Chinese texts to
represent a missing ideograph -- see <www.babelstone.co.uk/Unicode/CJK.pdf> if
you're interested), it was rejected by the UTC on the grounds that the proposed
character could be represented by U+25A1 [WHITE SQUARE]. If this, the most
ubiquitous of textual symbols is not deemed necessary to encode separately, then
I do wonder whether there is any chance of getting a set of more controversial
textual symbols accepted.
Andrew
This archive was generated by hypermail 2.1.5 : Tue Mar 08 2005 - 06:28:07 CST