Re: Semantics of Standards (was Re: Character found in national standard not defined in Unicode?)

From: George W Gerrity (
Date: Fri Apr 25 2008 - 21:39:07 CDT

  • Next message: Benjamin M Scarborough: "Zhuang tones three and four"

    I have renamed the subject, since both JFC Morfin's comments and my
    reply were off topic, an probably belong in a different forum. this
    is my last comment in this thread.

    On 2008-04-26, at 08:22, David Starner wrote:

    > On Fri, Apr 25, 2008 at 11:30 AM, George W Gerrity
    > <> wrote:
    >> To people writing specifications for Programming Languages, the
    >> difficulty
    >> of specifying meaning (or correct behaviour) in a Natural Language
    >> is well
    >> known. That is why specifications for newer Programming Languages are
    >> written in a meta-language, whose semantics and syntax is defined
    >> abstractly
    >> and Mathematically.
    > That's not how I would describe it. The grammar of Algol 60 was
    > described with Backus-Naur form, and just about every standardized
    > language since has used BNF or the equivalent to specify the grammar.
    > The meaning, however, is a much hairier beast. IIRC, Algol 68 tried to
    > formally specify meaning, but was considered a failure. Most
    > standardized languages since have used English to specify meaning
    > instead of any mathematical meta-language.

    I should have been more careful in my discussion: grammar is usually
    specified in BNF, but a number of forms have been used to specify
    semantics, especially SGML in newer languages. Algol 68 did indeed
    use a formal metalanguage to describe semantics, and whether or not
    it was a failure is moot. My opinion was, and remains, that the
    effort to use and understand a meta language proved too great both
    for the specifiers and the users, even though its use did indeed
    provide the required authority as to compiler correctness. The use of
    a compiler for a virtual machine to specify semantics, as in Pascal
    and Java, seems to be an easier approach, because it makes the
    Language Developer's understanding the Norm (per JFC Morfin).

    The problem with standard specification is that especially when the
    standard refers directly or indirectly to Natural Languages such as
    the Unicode standard natural languages are context-dependent in the
    widest possible sense (including the cultural, geographical, and
    particular subject context), and none of our formal tools are
    powerful enough to deal with this context sensitivity, including JFC
    Morfin's proposed solution. The best we can do is to use formal
    methods to describe the semantics of non-context-sensitive parts,
    usually by means of an algorithm, and to use a Natural Language very
    carefully to provide semantics for the context-sensitive parts. If
    the particular Natural Language chosen for the standard specification
    is unable to specify the semantics, then it cannot be specified by
    the standard. That is why, for instance, style is removed from
    Unicode and specified at a higher level where the producer (artist?)
    of the text can choose to exercise options that are context
    sensitive. That is also why there is such a difficult dividing line
    between encoding or not encoding a specific glyph, and why the
    standard is arbitrary at this level.

    Dr George W Gerrity Ph: +61 2 156 0286
    GWG Associates Fax: +61 2 156 0286
    4 Coral Place Time: +10 hours (ref GMT)
    Campbell, ACT 2612 PGP RSA Public Key Fingerprint:
    AUSTRALIA 73EF 318A DFF5 EB8A 6810 49AC 0763 AF07

    This archive was generated by hypermail 2.1.5 : Fri Apr 25 2008 - 21:43:23 CDT