RE: Another take on the English apostrophe in Unicode

From: Peter Constable <>
Date: Sat, 13 Jun 2015 15:10:06 +0000

I should qualify my statement. The Zwicky and Pullum article was a nice piece of linguistic analysis regarding the morphological characteristics of “n’t”. Their remark about apostrophe, however, was not so much about orthography — which was not the focus of their article — but was rather a way of putting an exclamation on their findings.

When it comes to orthography, the notion of what comprise words of a language is generally pure convention. That’s because there isn’t any single _linguistic_ definition of word that gives the same answer when phonological vs. morphological or syntactic criteria are applied. There are book-length works on just this topic, such as this:

Di Sciullo, Anna Maria, and Edwin Williams. 1987. On the definition of word. (Linguistic Inquiry monograph fourteen.) Cambridge, Massachusetts, USA: The MIT Press.


From: [] On Behalf Of Philippe Verdy
Sent: Saturday, June 13, 2015 12:03 AM
To: Peter Constable
Cc: Kalvesmaki, Joel; Unicode Mailing List
Subject: Re: Another take on the English apostrophe in Unicode

I disagree: U+02BC already qualifies as a letter (even if it is not specific to the Latin script and is not dual-cased). It is perfectly integrable in language-specific alphabets and we don't need another character to encode it once again as a letter.

So the only question is about choosing between:
- on one side, U+02BC (the existing apostrophe letter), and other possible candidate letters for alternate forms (including U+02C8 for the vertical form, and the common fallback letter U+00B4 present in many legacy fonts for systems built before the UCS was standardized and using legacy 8-bit charsets such as ISO 8859-1).
- and on the other side, U+2019 where it is encoded as a quotation punctuation mark (like also the legacy ASCII single quote)

Note that U+00B4 (from ISO 8859-1) has also been used in association with U+0074 (from ASCII) to replace the more ambiguous ASCII quote U+0027 by assigning an orientation: the exact shape of these two is variable, between a thin rectangle, or a wedge, or a curly comma (shaped like 6 and 9 digits), as well as the exact angle when it is a wedge or thin rectangle (these characters however have been used since long in overstriking mode to add accents over Latin capital letters, so the curly comma shapes are very uncommon and they are more horizontal than vertical and U+00B4 will be a very poor cantidate for the apostrophe that should have a narrow advance width.

So there remains in practice U+02BC and U+02C8 for this apostrophe letter (which one you'll use is a matter of preference but U+02C8 will not be used if there are two distinct apostrophes in the language (e.g. in Polynesian languages where the distinction was made even more clearer by using right or left rings U+02BE/U+02BF, or glottal letters U+02C0/U+02C1 if that letter has a very distinctive phonetic realisation as a plain consonnant with two variants like in Arabic or even U+02B0 when this is just a breath without stop: the full range range U+02B0-U+02C1 offers much enough variations for this letter if you need slight phonetic distinctions).

2015-06-13 8:28 GMT+02:00 Peter Constable <<>>:
Nice article, as I recall. (Been a long time.)


-----Original Message-----
From: Unicode [<>] On Behalf Of Kalvesmaki, Joel
Sent: Friday, June 5, 2015 7:27 AM
To: Unicode Mailing List
Subject: Re: Another take on the English apostrophe in Unicode

I don't have a particular position staked out. But to this discussion should be added the very interesting work done by Zwicky and Pullum arguing that the apostrophe is the 27th letter of the Latin alphabet. Neither U+2019 nor U+02BC would satisfy that position. See:

Zwicky and Pullum 1983 Zwicky, Arnold M., and Geoffrey K. Pullum. "Cliticization vs. Inflection: English N'T."Language59, no. 3 (1983): 502-513.

It's nicely summarized and discussed here:

Joel Kalvesmaki
Editor in Byzantine Studies
Dumbarton Oaks
202 339 6435

Received on Sat Jun 13 2015 - 10:10:39 CDT

This archive was generated by hypermail 2.2.0 : Sat Jun 13 2015 - 10:10:39 CDT