RE: Things to do with text

From: Michael Maxwell (mmaxwell@casl.umd.edu)
Date: Wed Nov 14 2007 - 11:17:25 CST

  • Next message: Gerrit Sangel: "How to type Fraktur with ligatures"

    John Hudson wrote:
    > So yes, I think the sort of things corpus
    > linguists might do would be of interest, and I'd like to gain
    > an understanding of some examples.

    Just a few ideas off the top.

    Interlinear text: there are some examples (in the context of how best to do it) at
       http://www.eva.mpg.de/lingua/files/morpheme.html
    and here's a collection effort for interlinear text:
       http://www.csufresno.edu/odin/
    Since these are collected from on-line papers, it might help see how linguists use interlinear text.

    On linguistic annotation in general, here's a comprehensive but dated web page:
       http://www.ldc.upenn.edu/annotation/
    Some of the links there that might provide more useful examples include CHILDES and Transcriber.

    "Corpus linguistics": googling this will return lots :-(), but this page
       http://bowland-files.lancs.ac.uk/monkey/ihe/linguistics/contents.htm
    has some useful links to how it's actually used. (This is a supplement to a book on the topic.) Section 4 (see link) is perhaps particularly useful for finding linguistic uses of corpora.

    There is lots more, but I fear I don't have time to track down web pages that show good examples--most descriptions of such tools seem to presuppose an audience that already knows what they want to do.

    Applications of annotated text (interlinear text being one kind of annotated text) include linguistic analysis; dictionary building (particularly for field linguists); machine learning of grammars, named entity extraction, information extraction; machine learning for statistical machine translation (parallel text corpora are particularly used for this purpose).

       Mike Maxwell
       CASL/ U Md



    This archive was generated by hypermail 2.1.5 : Wed Nov 14 2007 - 11:21:28 CST