UAX29-12 text boundaries: the case of apostrophes

From: Philippe Verdy (
Date: Wed Jul 25 2007 - 07:49:56 CDT

  • Next message: Philippe Verdy: "RE: UAX29-12 text boundaries: the case of apostrophes"

    UAX#29 includes an optional rule for handling the case of apostrophes, and
    just discusses the case of French and Italian when it is used as a mark for
    elision of letters and contraction of two words :

            Break between apostrophe and vowels (French, Italian).
            WB5a. apostrophe ÷ vowels

    Note that this case also occurs in English (in "It's a pity...", the words
    are effectively "it" and the contraction of the verb "is" with the leading
    letter elided, however it is more ambiguous in English because "'s" is also
    used as a genitive suffix as in "Bob's friend" or sometimes as plural
    suffix, although this suffix is often contracted once more by dropping the
    "s" and keeping only the apostrophe after a word ending in "s" or "sh" if
    this is a genitive mark as in "Tess' friends" and not a plural where the
    apostrophe will often be replaced by a "e".)

    But it only allows the ASCII apostrophe (U+0027) and the right curly
    apostrophe (or right single quotation mark U+2019).
    There is now the case where the apostrophe is used as a glottal mark for
    transcripting (for example) Polynesian languages (like Tahitian).

    For example the city of "Faa‘a" which should better use a right curly
    apostrophe (or sometimes a more technical character, almost never seen) i.e.
    U+2018, but is commonly written with one of the two other apostrophes (and
    in the official French IGN toponymy or INSEE administrative division names,
    this Polynesian glottal letter is most simply omitted, producing just
    "Faaa"). The effective choice of the character is most often made based only
    on typographical considerations, they are recognized as equivalent in these

    Is there a way to include this U+2018 (left single 6-shaped quotation mark)
    as another possible encoding for this apostrophe character?

    I am not suggesting adding the prime symbols, or the spacing acute and grave
    accents, because they are perceived as wrong (although they may be easily
    confused, or present due to the initial usage of a limited legacy charsets).

    This archive was generated by hypermail 2.1.5 : Wed Jul 25 2007 - 07:51:02 CDT