RE: U+2018 is not RIGHT HIGH 6 from Michael Probst on 2012-05-03 (Unicode Mail List Archive)

From: Michael Probst <michael.probst03_at_web.de>
Date: Thu, 03 May 2012 13:56:58 +0200

Am Mittwoch, den 02.05.2012, 13:46 -0700 schrieb Doug Ewell:
> Werner LEMBERG <wl at gnu dot org> wrote:
>
> >> So if two glyphs have enough "visual character" to be used in one
> >> document to express two different meanings, then they should be
> >> encoded as different characters?
> >
> > Yes, more or less.
>
> I don't think that's what was originally said. A quotation-mark
> character that can be used as either an opening or closing mark, but
> that doesn't kern correctly in some fonts when used as a closing mark,
> does not seem to justify disunification.

Definitely not. Some kerning issues would just have been solved in
passing had U+0022 and U+0027 logically been "disunified" and not only U
+2018 and U+201C got their counterparts but also U+201A and U+201E.

> > However, quotation characters need language
> > tagging or something like that; you certainly don't want to have the
> > situation to ask whether ' is the Byzantine opening quote, or ' the
> > Martian alternate closing quote, or ' the you-name-it. It's a
> > delicate issue.

'open' and 'close' have been "tagged to" U+0027 ('), encoding the
"tagged versions" as U+2018 and U+2019; it only seems reasonable to do
nothing less to U+002C (,) (and the same goes for adding the 'double'
"tag").

Had they been restricted to use Courier, would they have said: "Hey,
look, U+0027, U+2018 and U+2019 are the same. Let's unify them as, say,
U+0027."?

This has actually happened -- just in an inverted way:

Had they been restricted to a font like TeXGyrePagella, everyone would
have said that 'left opening high 6' (now encoded at U+2018) is not the
same as 'right closing high 6', because the difference, which is the
same as that between U+2018 and U+2019, would have been visible.

But it wasn't visible.

But such a potentially invisible difference was no reason not to "tag"
'minus' and 'en dash' to U+002D and encode the "tagged" versions as U
+2212 and U+2013.

> You don't want to turn this into a "German language" issue and have the
> solution not work for quotational material in other languages that use
> the same written conventions.

Not at all. The issue of a missed "tagging" or disunification, or a
wrong(?) unification, is *usage-specific* (and about logic, reason and
consistency), *not language-specific*, though usage depends on language.

Michael
Received on Thu May 03 2012 - 07:00:02 CDT

This archive was generated by hypermail 2.2.0 : Thu May 03 2012 - 07:00:02 CDT