Best 10646/Unicode character for apostrophes?

From: Olle Jarnefors (ojarnef@admin.kth.se)
Date: Wed Jul 03 1996 - 15:58:00 EDT


(I crosspost this message to the ISO10646 and Unicode
lists, in the hope of provoking comments from
extra-European experts.)

On the tc304wg2 list Michael Everson wrote, quoting me:

> >(Note that in MES, U+2019 (RIGHT SINGLE QUOTATION MARK),
> >not U+0027 (APOSTROPHE), should be used as single quote.
>
> "Should"? I use U0027 in my Web pages.

You're right and I was wrong. The correct coding of
apostrophes and single quotation marks in 10646/Unicode
isn't as simple as I thought when I wrote that. There
seems to be too many possible characters to use in
10646/Unicode. Most WWW readers support too few, and
U+0027 is the least bad to use in the web, presently.

In ISO 10646, the character names indeed suggest that
U+0027 should be used for apostrophes and U+2019 as
single quote character. But ISO character names are
occasionally misleading, and there is a very good reason
for using the _same_ 10646/Unicode character for both
apostrophe and closing single quotation mark: Not even
using a microscope one can tell apostrophes from such
quote characters in print, and Visual identity in _all_
contexts implies character identity (except when
different scripts are involved).

(Many languages, including English and Swedish, use this
character for closing a single quote, but there may be
languages using some other quotation mark in this
situation.)

The Unicode book (version 1.0) has another theory:

: 0027 ' APOSTROPHE-QUOTE
: = ISO APOSTROPHE
: neutral (vertical) glyph having mixed usage
: preferred character for apostrophe is 02BC
: preferred character for opening single quotation mark is 2018
: preferred character for closing single quotation mark is 2019

(Here quoting mark conventions identical to those of the
English language are tacitly presupposed.)

The Unicode book is correct in the observation that
U+0027 is not a good representation of either apostrophe
or closing single quotation mark. It is merely the
"ambiguous" or "neutral" character, invented with
type-writers, that can be used for both apostrophes and
single quotes, when better characters are not available
(like on type-writers and in restricted character sets
like ASCII, ISO 6937, ISO 8859).

I don't agree with the Unicoders that different
characters should be used for apostrophe and the closing
single quotation mark (U+02BC MODIFIER LETTER APOSTROPHE
and U+2019 RIGHT SINGLE QUOTATION MARK). The are visually
identical, so very few persons that enter text (and no
OCR programs) can be trusted to consistently choose the
correct character. The distinction between these
characters is useless in practice, and one of them should
be classified as a compatibility character; I would
prefer U+02BC to be so classified.

Unfortunately, the Unicode character database
<ftp://unicode.org/pub/MappingTables/UnicodeData-2.0.12.txt>
which I suppose is the final version, does not include
any hint of the very intimate relationship between U+02BC
and U+2019.

/Olle



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT