Multi-lingual corpus?

From: Ken Krugler (
Date: Wed Aug 24 2005 - 11:51:33 CDT

  • Next message: Neelesh Bodas: "Re: Unicode TTF question"

    Hi all,

    Kevin Burton has created an open source language detector written in
    Java (see
    and he's asking for contributions of sample data for additional

    Any suggestions for a multi-lingual corpus that could be used as
    training data? I believe he used some Wikipedia entries, but I'm
    hoping there are larger and more complete public data sets available.


    -- Ken

    Ken Krugler
    TransPac Software, Inc.
    +1 530-470-9200

    This archive was generated by hypermail 2.1.5 : Wed Aug 24 2005 - 12:02:48 CDT