RE: IPA Null Consonant

From: Kent Karlsson (
Date: Wed May 28 2003 - 16:51:11 EDT

  • Next message: Kent Karlsson: "RE: IPA Null Consonant"

    > I absolutely concur with Peter, Michael, and Lukas that
    > U+2205 EMPTY SET
    > is the correct and intended character to deal with this semantic
    > of null morphemes and other linguistic "zeroes" in technical
    > linguistic representation.

    And I (still!) very strongly disagree. The empty set symbol stands
    for the empty set (also written {}). But there is no set here, let alone
    an empty one. Possibly an empty string (of phonetic symbols?).
    Written as '' or "" in your favourite programming language, and
    conventionally written as a lowercase epsilon (ε) in math contexts.
    (That does not make the empty string equal to a string consisting
    of the letter ε, of course!)

    > In response to an earlier comment in the thread, I also agree that
    > the annotation in the names list for U+2205 should be updated
    > (for a future version -- it's too late for Unicode 4.0) to
    > indicate this explicitly, so that we won't have to revisit this
    > issue a few years down the road.

    Please not on the empty set symbol. Using the empty set symbol for
    this is of course possible, but inappropriate. Rather use CAPITAL

    > > WITH STROKE] are both ruled out as their semantics is
    > totally wrong.

    Not at all (as seen by example Jarkko quoted!). In Danish and Norwegian,
    yes. But in Swedish and Finnish that vowel is written ö (and Ö).
    But ö in e.g. English stands for completely different vowel, an o
    (or indeed even an å!).

    > > - ∅ [EMPTY SET] is the best choice if a single character has to be
    > > chosen from the current Unicode repertoire.

    Possible, but very inappropriate, both for semantic, typographic and
    other processing reasons.

    > Yes. And Pullum's discussion of this explictly calls out the
    > problem with confusable glyphs and notes that it has been
    > a persistent typesetting problem in linguistics:
    > "Mentioning the null sign here allows us to stress that it is
    > distinct from all four of the following visually rather similar
    > characters: Phi [ U+0278 ], Barred O [ U+0275 ], Slashed O
    > [ U+00F8 ], and Theta [ U+03D1, but showing the straight bar
    > glyph variant ]. Typesetting errors in connection with these
    > symbols are unfortunately fairly common."

    But capital "slashed o" (U+00D8) is not mentioned... And that letter
    would be entirely appropriate for this purpose **in the contexts** where
    it would stand for a "null consonant" (or empty string) in linguistics.

    > As Peter pointed out, linguists have also grown used to the
    > narrow glyph for their linguistic zero as the result of
    > many years of typewriter and/or daisywheel printer practice
    > of typesetting this symbol as <0, BACKSPACE, />, when nothing
    > better was available.

    Which would point to using <DIGIT ZERO, COMBINING LONG SOLIDUS
    OVERLAY> would also be appropriate.

    > The fact that the EMPTY SET symbol gets used in many different
    > ways in different disciplines, including linguistics, no
    > more requires the encoding of additional characters than does the
    > fact that U+0023 NUMBER SIGN (#) is also used conventionally
    > in linguistics as a symbol having nothing to do with numbers --
    > it indicates boundaries in phonology and morphology, instead.

    The EMPTY SET symbol was invented by the Bourbaki group for
    representing the empty set (though the design was inspired by
    the Norwegian/Danish letter). It does not appear to have wandered
    into linguistics in any way (except by occasional typographic mistake,
    and that does not count), even though there is use of a similar-looking
    symbol. (The # is a modern version of the L B BAR SYMBOL (℔, somewhat
    better glyph in the 4.0 charts than the 3.0 charts), for "libra" or pound; its
    use as "number sign" appears rather modern and american (I'm always
    appalled by the "UTR #nn" references). Compare also U+2117 VIEWDATA

    I think it would be less problematic to use the letter Ø for the empty
    set (in a math context), than to use the EMPTY SET symbol (Ø) for any
    linguistic entity in a word-like linguistic context.

                    /kent k

    This archive was generated by hypermail 2.1.5 : Wed May 28 2003 - 17:40:27 EDT