Re: Another take on the English Apostrophe in Unicode

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Tue, 16 Jun 2015 21:08:22 +0200

When ISO 8859-1 was designed (in fact in an early version by Digital for
its own version of Unix), allowing a bijective compatibility with 8-bit
EBCDIC and its C1 controls was still a priority.

Microsoft abandoned its own develomment of Unix to develop DOS and extend
it with Windows in parallel of its work with IBM that had wanted DOS to be
a very lightweight version of CP/M, but without a scheduler in order to run
softwares on personal computers that could be used in small organisations
that could not buy its mainframes, but had to prepare documents and data
that could be reused on IBM mainframes...

2015-06-16 19:02 GMT+02:00 Marcel Schneider <charupdate_at_orange.fr>:

> On Mon, Jun 15, 2015, 17:12, Doug Ewell <doug_at_ewellic.org> wrote:
>
> > Marcel Schneider wrote:
> [...]
> >> Microsoft’s choice of mashing up apostrophe and close-quote to end up
> >> with an unprocessable hybrid was wrong. Very wrong.
>
> > Windows-1252 and the other Windows code pages were developed during the
> > 1980s, before Unicode, when almost all non-Asian character sets were
> > limited to 256 code points. The distinctions between apostrophe and
> > right-single-quote, weighed against the confusion caused by encoding two
> > identical-looking characters, would never have been sufficient back then
> > to justify separate encoding in this limited space.
>
> I replied:
>
> > The problem is not about code pages [...]
>
> I thank you for your answers and I'll come back upon some of them below.
> There's some new fact to bring first.
>
> I concede that my last reply yesterday in the evening was incorrect.
>
> Additionally to Microsoftʼs action in the late nineties urging Unicode to
> give up its useful apostrophe recommendation (U+02BC), the design of code
> page Windows-1252 is in my scope, indeed.
>
> Since I learned there are very good and outweighing reasons to use U+02BC
> in English, and that Unicodeʼs respective recommendation has been withdrawn
> with respect to a widespread practice founded on CP Windows-1252, I soon
> suspected there would have been means to get the apostrophe into this code
> page. Here I need to recall that I always liked Windows-1252 for its
> completing the ISO 8859-1 charset (which was so useless* it had to be
> replaced with ISO 8859-15).
> * Please read this paper (in French):
> http://cahiers.gutenberg.eu.org/cg-bin/article/CG_1996___25_65_0.pdf
>
> Now that I examined closely CP1252ʼs layout, I found five empty code
> points, five code points left out, in the C1 ranges that Microsoft
> allocated to complete ISO 8859−1. Further, in this range, I found two
> MODIFIER LETTERS, CIRCUMFLEX ACCENT (136, 0x88, later U+02C6) and SMALL
> TILDE (152, 0x98, U+02DC). Obviously these two were added to disambiguate
> the extensively used spacing characters ^ (94, 0x5E) and ~ (126, 0x7E) on
> one side, and the diacritics on the other side. There is to say that when
> Windows was first released, the left and right single quotes were the only
> printable characters in these two ranges. All other characters plus × and ÷
> came later. However, CP1252 remained stable since Windows 98, for which €
> and the žŽ pair were added. And five places were left empty.
>
> From this on I got convinced that it would have been very easy to place
> the letter apostrophe for example at code point 144 (0x90), near the single
> turned comma quotation mark 0x91 and the single comma quotation mark
> (right-single-quote) 0x92 which Microsoft recommended for use as apostrophe.
>
> About the “confusion” everybody refers to, there is to say that the only
> way to get people confused, is to do things and not to explain anything to
> anybody.
>
> The core problem would have been that code pages were designed with
> glyph-based *character* encoding in mind, not semantics-based *text*
> encoding.
>
> I repeat that others had done even worse. Others, that is some of the
> so-called expert members of the ISO WG designing 8859-1, as two of them not
> even aimed at encoding all needed characters, by refusing deliberately to
> encode the lower- and uppercase Œ digraph, and even the uppercase Ÿ.
> Microsoftʼs big merit has been to produce a ready remedy to this bungling,
> that as far as belongs to the OE digraph, was meant to match defective
> peripherics.
>
> Unfortunately, Microsoft visibly didnʼt finish this job, by aiming at
> encoding characters only, and thus not allocating more than one code point
> to that squiggle, whilst several places were left.
>
> Well, all that are errors of the past. If I donʼt see a need, I wonʼt meet
> it. By leaving œ and Œ off the charset, they got × and ÷ in, at least.
> Where things ran really bad, was when Unicode was on, and code pages
> Procrustesʼ beds were out. At least, they should have been. Whence that
> survival of CP1252-based confusion?
>
> Briefly, todayʼs text processing is suffering from the
> apostrophe-close-quote confusion. This confusion is firstly out of date,
> and secondly it was unnecessary from the beginning on. Avoiding this
> confusion at a trivial level (by not getting users confused to have to use
> two similar squiggles), is shifting it at process level, where the damage
> it causes is far bigger. Trust me, users who find themselves unable to set
> apart the apostrophes when theyʼre going to replace single quotes, wonʼt
> bless Microsoft for the input simplicity! Ted Clancyʼs blog post is here to
> prove.
>
> https://tedclancy.wordpress.com/2015/06/03/which-unicode-character-should-represent-the-english-apostrophe-and-why-the-unicode-committee-is-very-wrong/
>
>
> It was time to get rid of that confusion when Unicode recommended U+02BC
> for apostrophe. Microsoftʼs choice not to comply was wrong again. Very
> wrong.
>
>
>
> Let's come back to some of your replies.
>
>
>
> On Mon, Jun 15, 2015, 20:14, Doug Ewell <doug_at_ewellic.org> wrote:
>
>
> > I'd guess there are very few users who consciously see the use of U+2019
> > as both apostrophe and right-single-quote as a vestige of code pages, or
> > as a conscious effort by Evil Microsoft™ to force them into anything.
>
>
>
> Quite sure. These are habits, not constraints. I'm not sharing such views
> about a battle between Google and Microsoft and about ethical prefixes to
> allocate to companies. The problem is that when the result proves to be
> bad, the idea was, too.
>
>
>
> The mismatch between apostrophe and close-quote is now part of our
> culture. We must get back pragmatic and see the advantages and
> disadvantages of each option (ambiguating, disambiguating), not say "I
> believe there are no disadvantages in ambiguating" or "there is no reason
> to disambiguate" or "people will get confused, let them alone" or the like.
> These all are statements. We must look at real people and listen to what
> they say to us. Ted Clancy is one of them. When he's worried about that
> malfunctioning of text-processing, who will keep smiling and stay saying
> "There's no problem, there's no reason to fix that, it's all OK like it
> is"?
>
> That's to despise people, that's to spit at their face.
>
>
>
> > Perhaps a UTC member can confirm whether this is fact or speculation.
> > Markus Kuhn's comment from 1999 about "couldn't Unicode follow
> > Microsoft...?" doesn't prove that Unicode was in fact strong-armed by
> > Microsoft.
>
>
>
> Yes, please let us know.
>
>
>
>
>
> Marcel Schneider
>
Received on Tue Jun 16 2015 - 14:09:41 CDT

This archive was generated by hypermail 2.2.0 : Tue Jun 16 2015 - 14:09:41 CDT