From: Jukka K. Korpela (firstname.lastname@example.org)
Date: Sun Feb 04 2007 - 07:21:22 CST
On Sun, 4 Feb 2007, Philippe Verdy wrote:
>> No, the Unicode standard clearly says that U+2019 is preferred as
>> punctuation apostrophe. The character U+0027 should have a neutral
>> (vertical) glyph, and usually has, though in some fonts it's slighly
> But why then all French spelling autocorrectors are changing the weak
> ASCII vertical quote into a curly apostrophe?
My copy of MS Word 2002, with document language set to French (France),
autocorrects U+0027 to U+2019 when it appears inside a word (as in
"l'homme"), which is just fine and quite consistent with what I wrote. As
Asmus wrote in his reply, the ASCII repertoire is largely an input
repertoire, or something that people type in using common keyboards, and
it is quite OK to autocorrect the input in a context- and
language-sensitive way, as long as the user understands what is going on
and knows how to undo or switch off the autocorrections as needed.
For French, my copy of MS Word 2002 autocorrects U+0027 to U+2018 (left
single quotation mark) at the start of a word. This means that by typing
'foo' I get foo surrounded by single quotation marks. Whether such
punctuation is acceptable in French is debatable. I have not managed to
find an authoritative statement on nested quotations (inner punctuation
marks) in French; by "authoritative" I mean something like issued by the
French Academy. (The current CLDR data says that inner quotation marks are
double quotation marks as normal quotation marks in English, but I have
also seen statements about using single quotation marks or single
guillemets and even double guillemets.)
Anyway, if a user types 'foo' when typing French, then it is rational to
expect that he wants to get single quotation marks, whether that's
orthographically correct by all books or not.
> i did not thought about the U+02BC modifier letter, but it is also an
> alternative encoding with similar rendering, but it looks quite bad
> because of its decomposition properties, and its glyph is the same as
> the acute accent which is straight and too much horizontal.
There's some confusion here, but it's actually water under the bridge.
U+02BC is just something else - neither a normal punctuation apostrophe
nor a quotation mark.
> Note also that French has some usage of the other curly single quote
> also as a letter (for transcribing some languages, notably the Arabic
> aleph) ;
That's because (probably unofficial) transliteration schemes use because
their designers did not know better alternatives or were afraid of using
them due to lack of sufficient software (font) support.
The international standard for romanization of Arabic, ISO 233, uses left
and right half ring to correspond to certain Arabic consonant letters.
Various simple transliteration schemes use often either U+2019 or U+0027
for one of them and leave the other out. By doing so, you choose to use
characters with multiple semantics instead of specific characters. This
might be a practical choice for various reasons, but has a more systematic
and less ambiguous alternative, too.
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Sun Feb 04 2007 - 07:25:04 CST