Re: Multi-lingual corpus?

From: Ken Krugler (
Date: Wed Aug 24 2005 - 13:25:50 CDT

  • Next message: Tom Emerson: "Re: Multi-lingual corpus?"

    >>Kevin Burton has created an open source language detector written
    >>in Java (see
    >>and he's asking for contributions of sample data for additional languages.
    >Beside his blog page, and the existing sourceforge project name, he
    >has not provided anything for now (there's no source and no demo
    >available, not even a alpha version).

    The code is available via CVS. You can view it at:

    >I wonder if it's a good idea to provide him with such data, if he
    >does not want to publish anything in fact (there may be legal issues
    >with his source, notably if he used copyrighted materials such as
    >the paper he is citing).

    Leaving aside any legal speculations, my query was for references to
    _open_ sources of text ("...public data sets...").

    -- Ken

    Ken Krugler
    TransPac Software, Inc.
    +1 530-470-9200

    This archive was generated by hypermail 2.1.5 : Wed Aug 24 2005 - 13:27:04 CDT