Re: New translation posted

From: Hans Aberg (
Date: Mon Feb 05 2007 - 11:39:53 CST

  • Next message: Hans Aberg: "Re: New translation posted"

    On 5 Feb 2007, at 14:40, Michael Maxwell wrote:

    > But how are you going to eliminate them "in the first place"? I
    > see two
    > choices: either automatically, or by hand. If it can be done
    > automatically in the first place, then it could be done
    > automatically during parsing.

    In the case of punctuation apostrophes, I am not sure it can always
    be done automatically, which would motivate admitting doing it by
    hand, that is adding such a separate character.

    > And I suspect the chances of getting document authors
    > to do it right by hand are slim, particularly since the two characters
    > would look identical on the screen (at least in a WYSIWYG editor; I
    > suppose you could use a character entity in a non-WYSIWYG editor).
    > And
    > if people mess up, then the parsing problem is even worse, because the
    > parser can't know which of the two characters it should be.

    There are GUI techniques for checking matching pairs, already use in
    editors used for computer language editing. Typically, when a closing
    pair is entered, the opening pair is brought into the window and
    blinked, or something.

    But if the rendering is identical, it might be difficult to catch
    mistakes, which may even happen with lookalikes. For example,
    swapping the letter O and the number 0, may results in a hard-to-
    catch error.

    Compare also with the at least two uses of a ".": sentence end
    marker, and abbreviation marker. The dots are typeset identical, but
    the typesetting spaces are different, signaling a semantic difference.

    One can the play the game in different ways: there are different
    character types, for example, input, semantic and rendering. U+0027
    might perhaps be called an input character and U+2019 a rendering
    character, with no semantic apostrophe character in the set. This
    difference of character types is more apparent in math: U+225D is
    "equal to by definition", a semantic character clearly, but U+2254 is
    "colon equals", which also can be used to indicate a defined
    mathematical object, is clearly a description of its rendering. In
    math, the usage of symbols are in flux, so there is no good universal
    resolution of the topic: it must be handled character by character.

       Hans Aberg

    This archive was generated by hypermail 2.1.5 : Mon Feb 05 2007 - 11:41:36 CST