Re: Another take on the English apostrophe in Unicode

From: Marcel Schneider <>
Date: Mon, 15 Jun 2015 08:45:25 +0200 (CEST)

At the following URL, a forum page illustrates the way users struggle since a decade (and more) against the chaotic confusion Microsoft perpetuated despite of Unicode, forcing the Committee to adopt its short views:

Please note Persephoneʼs workaround, which is a way to avoid the Apostrophe Catastrophe without turning off the “smart quotes”. This is the smartest thing Iʼve ever read about “smart quotes”.

This workaround, which I ignored, might explain why Microsoft refused to reengineer the smart quotes algorithm: Users have just to type two quotes and to delete one!

However, the problem of *handling* and *processing* such text stays unresolved. Users are conscious about a quote not being an apostrophe, this page shows. But they are compelled to use close-quotes for simulation of curly apostrophes. This works on the spot, but it brings bad quality text files.

Regardless of whether this matches Microsoftʼs business model or not, there is no right of dissuading font-designers from publishing complete fonts! Allocating the same glyph (U+2019) to a supplemental code point (U+02BC) is very easy when creating a font, but as Microsoft compelled Unicode to tell eveybody that there is no need of U+02BC in English and that our text files must not contain U+02BC, we lost sixteen years and thousands of fonts (including Arial Unicode MS, which surprisingly is lacking U+02BC!) are nearly unusable with correct text files because they donʼt include any typographical apostrophe. Except that U+0027 is curly in many ornamental fonts, to meet usersʼ expectations.

A ready workaround would thus be to disable the smart quotes and keep U+0027 as apostrophe (only), while entering U+2018/U+2019 by any means, and to replace eventually all instances of U+0027 by U+02BC. Or by U+2019 but only just before printing, never to publish in PDF and even less to send as a file or to publish on the internet!

As usual, the status quo which originated from legacy code pages (which were already considerably enriched compared to ISO 8859-1, be said to the honor of Microsoft) has been justified a posteriori with a lot of mostly biased arguments:

– The approval of U+2019 as apostrophe is based on glyphs and rendering and on a static view of text, excluding from scope the further word processing across documents and languages.

– Unicodeʼs principles are misapplied and even misinterpreted. The fact that different meanings across languages do not need different code points, is applied inside a given language to argue that distinction of semantics by different code points is not needed.

– Some arguments are obsoleted since they were uttered, so the U+02BC being a “spacing clone of Greek smooth breathing mark” (removed in 5.1) and thus never slanted, while in most fonts it has same shape as U+2019, slanted or curly.

– Another fallacy cites as a proof the use of U+2019 as apostrophe in some locales, while this is already based on CP1252-inspired practice against the spirit of Unicode.

– Bluring the issue by enumerating the various values of English apostrophe, which leads sometimes to include the close-quote function as punctuation apostrophe...

Whatever, there is nothing to save of the status quo. Unfortunately, the mass of wrongly encoded text goes on increasing while discussions follow one another. At least, that does not hinder publishing good books and newspapers and sending nice mails (on paper, where nobodyʼs asking whatʼs the code point, because thereʼs no need). About other media, thereʼs to say that hand-processing wrong text files increases the job volume— :( for managers, but :) for workers, at the condition that they are really paid for.

Marcel Schneider
Received on Mon Jun 15 2015 - 01:46:21 CDT

This archive was generated by hypermail 2.2.0 : Mon Jun 15 2015 - 01:46:21 CDT