Multi-lingual corpus?

From: Ken Krugler (ken@transpac.com)
Date: Wed Aug 24 2005 - 11:51:33 CDT

  • Next message: Neelesh Bodas: "Re: Unicode TTF question"

    Hi all,

    Kevin Burton has created an open source language detector written in
    Java (see
    <http://www.feedblog.org/2005/08/ngram_language_.html>http://www.feedblog.org/2005/08/ngram_language_.html)
    and he's asking for contributions of sample data for additional
    languages.

    Any suggestions for a multi-lingual corpus that could be used as
    training data? I believe he used some Wikipedia entries, but I'm
    hoping there are larger and more complete public data sets available.

    Thanks,

    -- Ken

    -- 
    Ken Krugler
    TransPac Software, Inc.
    <http://www.transpac.com>
    +1 530-470-9200
    


    This archive was generated by hypermail 2.1.5 : Wed Aug 24 2005 - 12:02:48 CDT