Re: IPA Null Consonant

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue May 27 2003 - 20:57:43 EDT

  • Next message: Kenneth Whistler: "Re: Dutch IJ, again"

    Thomas Widmann continued:

    > Peter_Constable@sil.org writes:
    >
    > > > >Yes, I think you're right that an annotation is best -- but only
    > > > >if EMPTY SET is indeed the right character. I'm increasingly of
    > > > >the opinion that a different character might be needed.
    > > >
    > > > I would disagree.
    > >
    > > As would I.
    >
    > Oh dear, if you both disagree with me, my chances of getting through
    > with this look slim indeed... :-)

    O.k., I've finally read the thread, and it's time for another
    linguist to chime in.

    I absolutely concur with Peter, Michael, and Lukas that U+2205 EMPTY SET
    is the correct and intended character to deal with this semantic
    of null morphemes and other linguistic "zeroes" in technical
    linguistic representation.

    In response to an earlier comment in the thread, I also agree that
    the annotation in the names list for U+2205 should be updated
    (for a future version -- it's too late for Unicode 4.0) to
    indicate this explicitly, so that we won't have to revisit this
    issue a few years down the road.

    >
    > But I'm wondering why.
    >
    > I think we all agree on the following:
    >
    > - Ø [LATIN CAPITAL LETTER O WITH STROKE] and ø [LATIN SMALL LETTER O
    > WITH STROKE] are both ruled out as their semantics is totally wrong.
    >
    > - 0 [DIGIT ZERO] is also ruled out because it looks wrong in most
    > fonts (and one might argue that the semantics isn't exactly right,
    > either).
    >
    > - ∅ [EMPTY SET] is the best choice if a single character has to be
    > chosen from the current Unicode repertoire.

    All correct.

    >
    > - But while ∅ [EMPTY SET] is normally just as wide as it is tall (it's
    > really just a circle with a stroke), the null symbol as used in
    > linguistics frequently looks more like 0 [DIGIT ZERO] with an added
    > stroke. (But many variations exist, including ∅ [EMPTY SET], ø
    > [LATIN SMALL LETTER O WITH STROKE] and other symbols, most of which
    > can be explained by typesetters and word-processing programs that
    > didn't know what they're doing.)

    Yes. And Pullum's discussion of this explictly calls out the
    problem with confusable glyphs and notes that it has been
    a persistent typesetting problem in linguistics:

      "Mentioning the null sign here allows us to stress that it is
       distinct from all four of the following visually rather similar
       characters: Phi [ U+0278 ], Barred O [ U+0275 ], Slashed O
       [ U+00F8 ], and Theta [ U+03D1, but showing the straight bar
       glyph variant ]. Typesetting errors in connection with these
       symbols are unfortunately fairly common."
       
    Pullum's representative glyph for the "null sign" is, as
    Thomas notes, of narrow aspect, and is essentially the
    slashed zero glyph, often seen in typesetting linguistic
    works. The alternative glyph, cited in Dinnsen, 1974, is
    the Symbol font glyph (0xC6), with the large round circle
    and a solidus overlay at a 45 degree angle. Which of these
    shows up in linguistic typography, as in many instances of
    typesetting linguistic material, is often a matter of what
    the compositor had available.

    Speaking in linguistic terms, what we have here is two
    graphemes, with an etic overlap in their actual glyphs used
    for display:

    U+0030 DIGIT ZERO
        common glyphs: zero without slash, zero with slash, zero with dot
           (where the addition of slashes or dots are ad hoc
            devices to minimize confusion with the letter O, usually)
            
    U+2205 EMPTY SET
        common glyphs: circle with 45 degree slash (PostScript symbol font),
                       zero with slash

    So if you just approach the problem graphically, you get an
    overlap, and there are glyphs which cannot be distinguished.
    But the *range* of acceptable glyphs for the two *characters*
    is distinct. A "zero with dot" glyph would never be appropriate
    for U+2205, for example.

    As Peter pointed out, linguists have also grown used to the
    narrow glyph for their linguistic zero as the result of
    many years of typewriter and/or daisywheel printer practice
    of typesetting this symbol as <0, BACKSPACE, />, when nothing
    better was available.
                      
    >
    > - Furthermore, semantically an empty set is not really the same thing
    > as a null symbol. (They both represent 'nothing', but so does 0
    > [DIGIT ZERO] and possibly other Unicode characters as well.)

    True enough. Linguists (and logicians and mathematicians) are
    very adept at discovering and representing many different kinds
    of 'nothing'. For linguists, in particular, the 'nothing's of
    interest are usually significant positions in structural
    patterns whose surface manifestation is no sound (or no
    written form). The significance is in the systematic contrast
    with a 'something'.

    However, for *character encoding* it is inappropriate to start
    trying to establish a distinct encoded character for each
    possible semantic distinction that could be associated with
    a concept of zero or nothing. The appropriate approach is to
    examine the written forms and typographical conventions for
    same or different distinctions. And the net result, I believe,
    is to conclude that there are two *characters*, with somewhat
    confusing overlap in their glyphic representation. (See above.)

    The fact that the EMPTY SET symbol gets used in many different
    ways in different disciplines, including linguistics, no
    more requires the encoding of additional characters than does the
    fact that U+0023 NUMBER SIGN (#) is also used conventionally
    in linguistics as a symbol having nothing to do with numbers --
    it indicates boundaries in phonology and morphology, instead.

    In the case of the "linguistic zero", the discussion is
    further muddled by the terminology per se. Phonologists
    and morphologists often talk about "zeroes" in their
    analyses -- fully aware that these "zeroes" have nothing
    to do with numeric values. And when their work then gets
    typeset with "slashed-zero" glyphs -- possibly even by their
    explicit preference -- the situation can get even more
    confused. But this would not be helped, for linguists or
    anyone else, by trying to introduce yet another character
    for NULL SYMBOL, whose only glyph would be the "slashed-zero"
    glyph. That would just make the visual overlap problem worse
    without helping at all in preserving the text distinctions
    required.

    > If you agree with all of the above, I'm wondering what the argument is
    > against a new Unicode character, called NULL or NULL SYMBOL.

    Just provided.

    > Surely
    > if it looks different from any existing character and has a
    > well-defining meaning also not covered, there must be a good case for
    > adding it...?

    Nope.

    --Ken (as his linguist avatar)



    This archive was generated by hypermail 2.1.5 : Tue May 27 2003 - 21:54:32 EDT