From: Tom Emerson (email@example.com)
Date: Wed Aug 24 2005 - 13:26:03 CDT
Philippe Verdy writes:
> I wonder if it's a good idea to provide him with such data, if he
> does not want to publish anything in fact (there may be legal issues
> with his source, notably if he used copyrighted materials such as
> the paper he is citing).
Well, the Cavnar and Trenkle paper has been around for a long time:
it's a trivial algorithm to implement, and has served as the
foundation for many of the open sourced or freely available
language/encoding ID systems that are out there. Most notably is van
Noord's Perl "TextCat" program, which has profiles for 77
Indeed, all of the data van Noord uses is included in his distribution.
The copyright issue is a real one, and he'll need to be careful if he
decides to re-release te data.
-- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "You can't fake quality any more than you can fake a good meal." (W.S.B.)
This archive was generated by hypermail 2.1.5 : Wed Aug 24 2005 - 13:27:04 CDT