Re: Best 10646/Unicode character for apostrophes?

From: Michael Everson (
Date: Thu Jul 04 1996 - 06:27:36 EDT

At 14:14 1996-07-03, Olle Järnefors wrote:

>In ISO 10646, the character names indeed suggest that
>U+0027 should be used for apostrophes and U+2019 as
>single quote character. But ISO character names are
>occasionally misleading, and there is a very good reason
>for using the _same_ 10646/Unicode character for both
>apostrophe and closing single quotation mark: Not even
>using a microscope one can tell apostrophes from such
>quote characters in print, and Visual identity in _all_
>contexts implies character identity (except when
>different scripts are involved).

Well that's what we have. The problem isn't the characters, it's defining
functions for the characters -- how punctuation marks behave when you
double-click a word to select it is the most obvious problem (and the one
which has given rise to this discussion). I certainly agree with you that
character names are important as guides to implementors and users.

>The Unicode book (version 1.0) has another theory:
>: neutral (vertical) glyph having mixed usage
>: preferred character for apostrophe is 02BC
>: preferred character for opening single quotation mark is 2018
>: preferred character for closing single quotation mark is 2019
>(Here quoting mark conventions identical to those of the
>English language are tacitly presupposed.)

Oh quit bashing English. If you mean "preferred character for opening
single quotation mark AND closing single quotation mark in Swedish is 2019"
then say so, Olle. (You know I work for the concerns of languages of
limited diffusion (how's that for PC?), Olle, but really, I don't
understand how English becomes this huge scapegoat for everything. It would
be nice to point out that there are at least HUNDREDS of languages which
use the same quoting mark conventions referred to in Unicode 1.0. Putting
it the way you have makes it look like there's some conspiracy of some kind
going on!)

>The Unicode book is correct in the observation that
>U+0027 is not a good representation of either apostrophe
>or closing single quotation mark. It is merely the
>"ambiguous" or "neutral" character, invented with
>type-writers, that can be used for both apostrophes and
>single quotes, when better characters are not available
>(like on type-writers and in restricted character sets
>like ASCII, ISO 6937, ISO 8859).

Be fair. It was a useful and sensible invention given the constraints of
the technology.

>I don't agree with the Unicoders that different
>characters should be used for apostrophe and the closing
>single quotation mark (U+02BC MODIFIER LETTER APOSTROPHE
>and U+2019 RIGHT SINGLE QUOTATION MARK). The are visually
>identical, so very few persons that enter text (and no
>OCR programs) can be trusted to consistently choose the
>correct character. The distinction between these
>characters is useless in practice, and one of them should
>be classified as a compatibility character; I would
>prefer U+02BC to be so classified.

I always thought that the MODIFIER LETTER APOSTROPHE was meant to be a
LETTER of the alphabet to be used by languages (such as Ukrainian or Skolt
Sámi or many North American languages?) which use such a letter as a mark
of length, or as a glottal stop. I hadn't noticed that it was meant to be a
mark of punctuation. I think that this is quite strange.

