Re: Best 10646/Unicode character for apostrophes?

From: Mark Leisher (
Date: Thu Jul 04 1996 - 12:24:16 EDT

    Olle> I don't agree with the Unicoders that different characters should be
    Olle> used for apostrophe and the closing single quotation mark (U+02BC
    Olle> MARK). The are visually identical, so very few persons that enter
    Olle> text (and no OCR programs) can be trusted to consistently choose the
    Olle> correct character. The distinction between these characters is
    Olle> useless in practice, and one of them should be classified as a
    Olle> compatibility character; I would prefer U+02BC to be so classified.

Although visually identical, there are text processing tasks that can make use
of the distinction between the two. An English parser often has to
automatically distinguish between the use as an apostrophe or a single quote,
which is not always easy.

On the other hand, nobody expects OCR software to be smart enough to determine
the appropriate code for the visually identical glyphs, but these kinds of
programs can simply default to one consistent codepoint.
Mark Leisher "The trick is not gaining the knowledge,
Computing Research Lab but surviving the lessons."
New Mexico State University -- "Svaha," Charles de Lint
Box 30001, Dept. 3CRL
Las Cruces, NM 88003

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT