Re: (SC2WG2.609) New contribution N2705

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Feb 17 2004 - 20:18:12 EST

  • Next message: Kenneth Whistler: "Re: Fwd: Re: (SC2WG2.609) New contribution N2705"

    Peter Kirk pointed out:

    > This is not some kind of unusual orthography but a
    > specialist scientific notation. It is the same notation as h1, h2, h3 or
    > ha, hb, hc etc (the second character subscripted in each case) used in
    > all kinds of notational conventions but primarily mathematical and
    > scientific ones. Some lingustics textbooks are full of this kind of
    > notation. For an example chosen almost at random, I found the following
    > in an old paper by Kenneth Pike (in Ruth M. Brend ed. "Advances in
    > Tagmemics", North-Holland 1974, p.238):
    >
    > (2) eMk = eaTCaf, eaTCgf, egTCaf, egTCgf
    >
    > where all the lower case letters are subscripted, and examples of this
    > in which the word "catch" is followed by subscript af or gf.

    And I agree with him. The Indo-Europeanist usage is just a
    very restricted subset that fades off, even in historical
    linguistic usage, to general conventions of mathematical
    and logical formulation to express relationships of various sorts.

    Another example in a paper about morphological analysis which
    clearly involves mathematical formulations:

    http://www-2.cs.cmu.edu/~alavie/Sem-MT-wshp/ltai+Segal_paper.pdf

    You could start down the road of thinking that the formulations
    of T<sub>1</sub>, T<sub>2</sub>, and so on should just use
    the compatibility subscript digits in Unicode. Then you hit
    T<sub>i</sub>. Is that actually U+1D62 LATIN SUBSCRIPT SMALL LETTER I
    or just a subscripted U+0069? And then you clearly run out of
    gas when you hit:

         t<sub>nm<sub>n</sub></sub>
         
    with recursive subscripting.

    > My point here is that if we once start on encoding subscript letters
    > used in specialist scientific notation, there is no easy place to stop.
    > Either we need to accept the principle that subscripts are encodable and
    > set aside space for a whole alphabet of them (and an upper case alphabet
    > and a Greek alphabet as well, plus punctuation); or else we need to say
    > from the start that these things are not plain text and should not be
    > encoded in Unicode.

    It may be reasonable for Michael to argue for the subscript a, e, and
    o for Indo-European, since he already got a subscript i and u encoded
    for the UPA. Arguably, the subscript a, e, and o *are* phonetic
    modifier letters, since they represent hypothesized vowel-coloring
    of the laryngeal symbol. The subscript x is trickier, since it
    is an algebraic substitution for (a ~ e ~ o), so we are skating
    on thin ice there, with a notation that is arguably not a
    phonetic modifier letter. And the subscript / is over the edge,
    as far as I am concerned. It clearly is introducing a generic
    notational convention into the realm where we are expecting only
    discrete modifier letters to require encoding as separate
    characters. And if I run into an Indo-Europeanist notation of
    the alternations such as:

    *h<sub>1/3</sub>

    or

    *dhug'hH(<sub>e/o</sub>)ter

    what is to guarantee that I won't find alternative representations
    of such formulations using "~" instead of "/", for example? Do
    we then also need a subscript tilde to handle that?

    Furthermore, Michael carefully dodged the point that all of these
    Indo-European sources are *already* fonted, styled text. They
    are *not* plain text, but mix italic citations with Roman forms.
    Unless we are going to also head down the road of plain text
    italic letter clones for Indo-European, all of this material already
    has to be dealt with as rich text.

    The proposal states:

    "Styled text is not seen as appropriate for these; Indo-Europeanists
    already make use of the subscript digits, and superscript h and w
    and so on, already encoded. The characters proposed here are
    required for plain-text representation of Indo-European reconstructed
    material."

    I concur that superscript h and w and so on are o.k. -- they truly
    are modifier letters and appropriate in transcriptional plain
    text. Nobody is arguing about that point.

    But I think it is a mistake to be using the compatibility
    subscript digits for generic subscripting. Of course, I can't
    help it if people are already doing so, but it gets us into this
    conundrum of people expecting any subscripted expression to
    be expressible in plain text, and that is just clearly wrong --
    it isn't generic or scalable. And it results in people coming
    back to the table asking for more of them every time some
    community is found making some other use of them. As Peter Kirk
    pointed out, this kind of use of subscripting in linguistic
    material is widespread.

    Take an example, pulled more or less at random off the web,
    Topics in Tiberian Biblical Hebrew Metrical Phonology and
    Prosodics, by Henry Churchyard (a 1999 Ph.D. dissertation).

    http://www.crossmyt.com/hc/linghebr/

    (in case anyone wishes to check up on me)

    This uses conventions fairly widespread in metrical phonology,
    where F stands for foot, lowercase-sigma stands for syllable,
    and lowercase-mu stands for mora. If you examine the document, you
    find instances of all 3 subscripted in various combinations,
    in addition to the typical usage of subscripted numbers and
    subscripted i to indicate particular consonants and matching
    consonants:

       -C<sub>i</sub>C<sub>i</sub>#
       
    So you find constructs like:

       [<sub>F</sub>[<sub>sigma</sub>mu<sub>sigma</sub>]
          [<sub>sigma</sub>mumu<sub>sigma</sub>]<sub>F</sub>]
          
    And:

       sigma-with-combining-breve<sub>mu</sub>
       
       to represent: "a light syllable which is not a bimoraic-trochee
                      reduction structure head"
                      
    Now, if, as Michael subsequently claimed:

    > Or we do what we have done so far. Encode what people have been using.

    Are we missing subscript-F, subscript-sigma, and subscript-mu for
    metrical phonologists?

    In case you missed it, that was a rhetorical question, and the
    answer to it should be no. :-)

    By the way, as I indicated, the case for the subscript-a, e, and o
    seem better to me. The above dissertation, for example, makes use
    of the subscript-a as a transcriptional notation for the furtive
    patah -- the kind of evidence that argues *for* such a character
    as useful for a plain text representation of linguistic
    transcription.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue Feb 17 2004 - 20:59:36 EST