Exemplifying apostrophes

From: Jonathan Pool (pool@utilika.org)
Date: Sat May 17 2008 - 00:09:27 CDT

  • Next message: Eric Muller: "Re: Exemplifying apostrophes"

    > I'd expect the reason to be very simple: U+0027 APOSTROPHE is easy to key in.

    True, but most characters are difficult for most users to enter. I was asking
    about the consortium’s own pages, which I surmised aren’t difficult for the
    consortium to enter.

    But even the mail from this list seems not to be sent with a Unicode encoding;
    another mystery.

    > if
    > you mean normalization in the sense of transforming to a Normalization
    > Form

    Sorry, I meant normalizing according to a set of Unicode-inspired orthographic

    > Regarding in particular U+0027 APOSTROPHE in existing data, I strongly suggest
    > that if you do not know absolutely and provably what it really “stands for,”
    > don‘t touch it. When reading text in a natural language that you know
    > well, you can usually know what U+0027 should be changed to, but if it‘s
    > anything that might be a foreign name or code-like notation, it‘s easy
    > to go wrong.

    I appreciate your caution. On the other hand, not touching it is a decision,
    too. If different sources represent the same lexeme with different apostrophes
    and we refrain from touching them, then we’re asserting (in our project) that
    these lexemes are distinct, and this interferes with our discovery of
    translation paths through the lexeme.

    > This was discussed some time ago on this list when I raised the issue.
    > Check back the list archives if you are interested in people’s views as
    > they expressed them, but my impression was that this was not regarded as
    > important enough to be done right.

    Apparently, though it was by some (e.g., James Kass, who argued--against the
    view of Asmus Freytag--that Web pages are more, not less, subject to an
    expectation of standard conformity than are paper-printed works, and finished
    with: “Web pages on the Unicode site should be exemplary”). For my purposes,
    it would certainly help if they were exemplary, and it casts doubt on the
    claim of practicality of the standard when the standardizing authority doesn’t
    comply. Thanks for the reference.

    > There is this problem in ukrainian language, where apostrophe means hard sign.
    > How to reproduce it in original cyrillic script? It would not be a "diacritic"
    > character as apostrophe, but it is really the original cyrillic character at
    > the moment (The Ukrainian National Library thake it as an apostrophe U+0027).

    > Same as in the Latin script: U+2019
    > http://www.unics.uni-hanover.de/nhtcapri/cyrillic-script.html5

    Why? This seems to conflict with the standard as I understand it. I believe
    it’s a letter with a phonological value, not a punctuation mark, so I
    understand the standard to state that the correct character is 02BC (MODIFIER
    LETTER APOSTROPHE). I believe that this is argued for at
    in message 87. If I’m incorrect, I’d appreciate an explanation. Thanks.

    This archive was generated by hypermail 2.1.5 : Sat May 17 2008 - 00:13:44 CDT