Re: a character for an unknown character

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 21 Dec 2016 08:57:56 +0100

there's a "replacement" control, whose rendering is undefined. It may
represent any missing part covering more than one character, such as parts
that have been burned, or overstrikken. This Unicode character can act as a
substitute but its rendering is purposely undefined. An application may
show some greyed box there, but it should not be the tofu box used for
characters not mapped in the specified fonts.
Older encoduing used the ASCII control "SUB" for representing this
function. Some terminals displayed it as a filled box Other documents have
used the ASCII DEL control for the same purpose. However for Unicode
encodings ASCII controls should be avoided.

This is not an Emoji, as Emojis have a clear visual representation and
semantics (and often specific colors). But you're right, it should be a
symbol in Unicode (like Emojis, but unlike ASCII controls)

2016-12-21 3:29 GMT+01:00 Martin Mueller <martinmueller_at_northwestern.edu>:

> I’m new to this list. Please excuse my technical incompetence.
>
> Is there a Unicode character that says “I represent an alphanumerical
> character, but I don’t know which”. This is a very common problem in the
> transcription of historical texts where you have lacunas. Often, the extent
> of the lacuna is known, and the alphabet is known as well. The EEBO TCP
> transcriptions of English texts before 1700 are good examples. They are
> SGML transcriptions, where missing stuff is represented by <gap/> elements
> with attributes about this or that. This is efficient when it comes to
> pages, very inefficient when it comes to individual characters.
>
> There is a Web character—a diamond with a question mark inside it—which
> means “I may know what this character represents, but I can’t display it”.
> Which is a very different message. On the other hand, if you extened the
> use of that character, it probably wouldn’t’ create much ambiguity.
>
> In the TCP project, various code points from the Geometrical were used to
> represent lacunae. The black circle (\u25cf) has been used as the character
> for a missing character.This is OK and unambiguous in its context. But
> would be nice to have a special character for just that purpose, and given
> the number of emoji, this doesn’t seem to be a particularly frivolous
> request. Which alphabet, you might ask. But that doesn’t really matter.
> There is a very high probability that the missing character comes from the
> character set of the surrounding words. And if that isn’t the case, the
> transcriber wouldn’t know it. S/he sees that there is something, perhaps
> even that there is just one of it, but doesn’t know which
>
>
>
> Martin Mueller
>
> Professor emeritus of English and Classics
>
> Northwestern University
>
Received on Wed Dec 21 2016 - 01:59:12 CST

This archive was generated by hypermail 2.2.0 : Wed Dec 21 2016 - 01:59:14 CST