Re: Multi-lingual corpus?

From: Ken Krugler (ken@transpac.com)
Date: Wed Aug 24 2005 - 13:25:50 CDT

  • Next message: Tom Emerson: "Re: Multi-lingual corpus?"

    >>Kevin Burton has created an open source language detector written
    >>in Java (see http://www.feedblog.org/2005/08/ngram_language_.html)
    >>and he's asking for contributions of sample data for additional languages.
    >
    >Beside his blog page, and the existing sourceforge project name, he
    >has not provided anything for now (there's no source and no demo
    >available, not even a alpha version).

    The code is available via CVS. You can view it at:

    http://cvs.sourceforge.net/viewcvs.py/ngramcat/ngramcat/

    >I wonder if it's a good idea to provide him with such data, if he
    >does not want to publish anything in fact (there may be legal issues
    >with his source, notably if he used copyrighted materials such as
    >the paper he is citing).

    Leaving aside any legal speculations, my query was for references to
    _open_ sources of text ("...public data sets...").

    -- Ken

    -- 
    Ken Krugler
    TransPac Software, Inc.
    <http://www.transpac.com>
    +1 530-470-9200
    


    This archive was generated by hypermail 2.1.5 : Wed Aug 24 2005 - 13:27:04 CDT