Re: a character for an unknown character

From: Rebecca T <637275_at_gmail.com>
Date: Wed, 21 Dec 2016 08:11:16 +0000

U+FFFD REPLACEMENT CHARACTER �

On Wed, Dec 21, 2016 at 3:05 AM Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:

> there's a "replacement" control, whose rendering is undefined. It may
> represent any missing part covering more than one character, such as parts
> that have been burned, or overstrikken. This Unicode character can act as a
> substitute but its rendering is purposely undefined. An application may
> show some greyed box there, but it should not be the tofu box used for
> characters not mapped in the specified fonts.
> Older encoduing used the ASCII control "SUB" for representing this
> function. Some terminals displayed it as a filled box Other documents have
> used the ASCII DEL control for the same purpose. However for Unicode
> encodings ASCII controls should be avoided.
>
> This is not an Emoji, as Emojis have a clear visual representation and
> semantics (and often specific colors). But you're right, it should be a
> symbol in Unicode (like Emojis, but unlike ASCII controls)
>
> 2016-12-21 3:29 GMT+01:00 Martin Mueller <martinmueller_at_northwestern.edu>:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> I’m new to this list. Please excuse my technical incompetence.
>
>
> Is there a Unicode character that says “I represent an alphanumerical
> character, but I don’t know which”. This is a very common problem in the
> transcription of historical texts where you have lacunas. Often,
>
> the extent of the lacuna is known, and the alphabet is known as well. The
> EEBO TCP transcriptions of English texts before 1700 are good examples.
> They are SGML transcriptions, where missing stuff is represented by <gap/>
> elements with attributes about this
>
> or that. This is efficient when it comes to pages, very inefficient when
> it comes to individual characters.
>
>
> There is a Web character—a diamond with a question mark inside it—which
> means “I may know what this character represents, but I can’t display it”.
> Which is a very different message. On the other hand, if you
>
> extened the use of that character, it probably wouldn’t’ create much
> ambiguity.
>
>
>
> In the TCP project, various code points from the Geometrical were used to
> represent lacunae. The black circle (\u25cf) has been used as the character
> for a missing character.This is OK and unambiguous in its
>
> context. But would be nice to have a special character for just that
> purpose, and given the number of emoji, this doesn’t seem to be a
> particularly frivolous request. Which alphabet, you might ask. But that
> doesn’t really matter. There is a very high probability
>
> that the missing character comes from the character set of the surrounding
> words. And if that isn’t the case, the transcriber wouldn’t know it. S/he
> sees that there is something, perhaps even that there is just one of it,
> but doesn’t know which
>
>
>
>
>
>
> Martin Mueller
>
>
> Professor emeritus of English and Classics
>
>
> Northwestern University
>
>
>
>
>
>
>
>
>
>
>
>
Received on Wed Dec 21 2016 - 02:11:55 CST

This archive was generated by hypermail 2.2.0 : Wed Dec 21 2016 - 02:11:56 CST