Re: Multi-lingual corpus?

From: Tom Emerson (tree@basistech.com)
Date: Wed Aug 24 2005 - 13:26:03 CDT

Next message: Bruno Lowagie: "Re: Unicode TTF question"

Previous message: Ken Krugler: "Re: Multi-lingual corpus?"
In reply to: Philippe Verdy: "Re: Multi-lingual corpus?"
Next in thread: Philippe Verdy: "Re: Multi-lingual corpus?"
Reply: Philippe Verdy: "Re: Multi-lingual corpus?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy writes:
> I wonder if it's a good idea to provide him with such data, if he
> does not want to publish anything in fact (there may be legal issues
> with his source, notably if he used copyrighted materials such as
> the paper he is citing).

Well, the Cavnar and Trenkle paper has been around for a long time:
it's a trivial algorithm to implement, and has served as the
foundation for many of the open sourced or freely available
language/encoding ID systems that are out there. Most notably is van
Noord's Perl "TextCat" program, which has profiles for 77
language/encoding pairs:

http://odur.let.rug.nl/~vannoord/TextCat/

Indeed, all of the data van Noord uses is included in his distribution.

The copyright issue is a real one, and he'll need to be careful if he
decides to re-release te data.

-tree

-- 
Tom Emerson                                          Basis Technology Corp.
Software Architect                                 http://www.basistech.com
 "You can't fake quality any more than you can fake a good meal." (W.S.B.)

Next message: Bruno Lowagie: "Re: Unicode TTF question"
Previous message: Ken Krugler: "Re: Multi-lingual corpus?"
In reply to: Philippe Verdy: "Re: Multi-lingual corpus?"
Next in thread: Philippe Verdy: "Re: Multi-lingual corpus?"
Reply: Philippe Verdy: "Re: Multi-lingual corpus?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Aug 24 2005 - 13:27:04 CDT