Re: Apostrophes (was Re: Exemplar Characters)

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Nov 15 2005 - 17:40:42 CST

  • Next message: Kenneth Whistler: "Re: Apostrophes (was Re: Exemplar Characters)"

    Chris asked:

    > That’s not what I was talking about at all. It should not matter what the
    > value of ’ in Breton or Mohawk is, nor did I ever say that Breton has a
    > glottal stop.
    >
    > If I may, I’d like to rephrase my question.
    >
    > Language X has the following alphabet:
    >
    > a h i k n p r t u x y ’
    >
    > Point 1: It doesn’t matter what the phonetic realisations of these are to
    > assign a Unicode codepoint. We know that Latin Script a is U+0061
    > regardless of how it’s pronounced.

    Correct.

    >
    > Point 2: We have evidence from Breton that U+2019 is used as part of an
    > alphabetic letter, instead of just punctuation.

    Correct.

    >
    > a is U+0061
    > h is U+0068
    > ...
    > ’ is what?
    >
    > We could choose U+2019 or we could choose U+02BC. Which one is best?
    >
    > I hope this question makes sense.

    It makes sense, but it doesn't have a determinant answer. Either
    one could be the best, depending on the orthographic tradition,
    its use with other languages (with which it might need to share
    letters and keyboards, for example, as in the French/Breton case),
    or other concerns.

    The correct answer might even be U+0027 APOSTROPHE.

    The issue of which *character* an orthography should standardize
    on is an issue for the standardizers of that orthography (if they
    exist). The apostrophe in particular will always be an inherently
    problematical edge case, because it has been used in so many ways,
    has never graduated to bona fide Latin letter status, overlaps with
    punctuation uses of similar signs, and now has at least 3 forms to
    choose from in Unicode.

    U+0027 is weighted towards ASCII compatibility
    U+02BC is weighted towards ease of word selection
    U+2019 avoids glyph ambiguity, and is more available for input than U+02BC

    You just have to take some bad with the good for each, and make a
    choice.

    By the way, for Mohawk in particular, I think groups like these
    are the appropriate ones to be deciding:

    http://www.edu.gov.on.ca/eng/training/literacy/mohawk/mohawk.html
    http://www.edu.gov.on.ca/eng/training/literacy/mohawk/mohawk1.html

    (The second link is *in* Mohawk.)

    As of 1993, the correct answer there was U+0027. I don't know where
    things stand now, or if materials have changed. Clearly, the *easiest*
    thing to do to get Mohawk materials online is use U+0027, regardless
    of any ambiguities of form and function for that character.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue Nov 15 2005 - 17:41:47 CST