Date: Fri Jun 16 2006 - 01:28:01 CDT

    On Thu, 15 Jun 2006, John Hudson wrote:

    > As far as I'm concerned, the encoding of what the standard clearly
    > acknowledges as a 'glyph variant of 2018' as a separate character is itself
    > contrary to the Unicode character/glyph model.

    I'm afraid the statement in the standard (in the code chart) is not quite
    clear either. What does it mean to say that a character is a glyph variant
    of another character? The natural interpretation, I'd say, is that there
    is only a glyph difference but the character was encoded separately for
    some compatibility reasons, e.g. because some base standard makes such a
    distinction. But then I would expect that the characters are defined as
    compatibility equivalent, or perhaps even canonically equivalent. Yet
    U+2018 and U+201B are two completely distinct characters. This sounds like
    the result of some interesting compromise.

    On the practical side, the standard probably makes its point clear in the
    description of U+2018 in the code chart, when it says "this is the
    preferred glyph (as opposite to U+201B)". The wording is odd (even calling
    a character a glyph), but apparently the idea is that although U+201B is
    included into the standard and although no formal relationship between it
    and U+2019 is defined, U+2019 and U+201B are essentially two glyph forms
    of the same character, with an expressed preference to the former. By
    "glyph form", I mean the "6" form shape and the reversed "9" form shape.

    I cannot comment on the question whether this policy is reasonable, but it
    seems to be the current Unicode policy. Of course, it does not prevent
    anyone from using U+201B if he finds it correct for orthographical or
    typographical reasons.

    In general, quotation marks are a problem in encoding characters largely
    because there has been considerable variation in the shapes of quotation
    marks in printed matter and in handwriting, even within a language.
    Until rather recently, it was common to regard e.g. curly quotation marks
    and chevrons just as two different styles for quotes. Publications might
    even use asymmetric quotes in headings but symmetric quotes in copy text,
    etc. So there was a tough decision: which variation is interpreted as a
    character difference?

