Re: apostrophes

From: Philippe Verdy (
Date: Wed May 24 2006 - 18:43:43 CDT

  • Next message: Asmus Freytag: "Re: Stability Pact - No correcting Errors pact - apostrophes"

    Regarding locales, isn't the apostrophe (which litterarily means a mark for the orthographic ellision of letters in words, this term being adequate only for describing the function of the ' character in "it's" instead of "it is") also sensitive to locales?

    The apostophe with this meaning is NOT a letter, despite it IS orthographic. It is not apunctuation mark either (due to its orthographic importance and the fact that it replaces letters and helps creating compound words). I would say the same for the orthographic hyphen in compound words.

    But the apostrophe has various representation even in the same language. On most keyboards it is entered as ' but typographically it should be a curly elevated comma, and distinct from the 9-shaped single-quote punctuation.

    But in Swedish(?) (or other Nordic languages), the apostrophe in the middle of a word is noted by a colon ! For denoting the elision at end of words, the dot is used (also in English, French and most Latin-written languages, except for some English cases for the plural of family names like "The Smiths' home" where the regular apostrophe is used to mean that it is the plural "s" that is truncated)

    It's difficult in fact to freeze the rule for the various meaning of punctuation-like symbols used to mark the orthographic ellision (dot, vertical quote, right quote, colon) when parsing plain-text. The rules are extremely fuzzy, and there's some justification to encode somewhere the information about whever the symbol is orthographic or a punctuation. This would avoid making typographical errors when rendering the text or when using spell checkers.

    Why not then a invisible combining character with combining class 0, encoded after the punctuation symbol, or formating control? This would help mark the meaning of the symbols, where the heuristic rules are very fuzzy. We would then need only two new characters:
    * one for specifying that the encoded visible character is effectively a punctuation mark.
    * one for specifying that the encoded visible character is instead a orthographic elision mark.
    Such information is by essence NOT glyphic, and is really part of the plain-text (but for now is not correctly represented in plain-text documents, creating difficulties in spell-checkers if they can't store this information in the plain-text version of the document but only in meta data with a higher protocol). These controls or invisible marks would THEN be used to select the appropriate glyph for rendering the punctuation-like symbols (including using the correct locale preferences for the orthographic ellision mars, and the position and orientation of quoting punctuation marks)

    Note that when the symbol is orthographic, it is STILL NOT a letter:

    * The letter is already encoded separately and marks letters that don't exist in some alphabet of the Latin script, such as Arabic alef, or a glottal stop. There's a Latin glottal stop letter but it is most often used only in IPA applications (the introduction of a capital version of this API symbol is very recent in Unicode). For such code, using locale preferences would be an error which would change the orthography of words, and even possibly the meaning of whole sentences.

    * Such real letter does not exist in English and most West-European languages. It is used only in special cases like the transliteration of family names, toponyms, artistic titles, and trademarks. For other cases, each language tends to use a replacement letter according to its own orthographic rules (for example a Q or an H or a accent on the following letter, or sometimes nothing at all especially at the beginning of translitterated words).


    This archive was generated by hypermail 2.1.5 : Wed May 24 2006 - 18:48:08 CDT