    Philippe Verdy writes:
    > I wonder if it's a good idea to provide him with such data, if he
    > does not want to publish anything in fact (there may be legal issues
    > with his source, notably if he used copyrighted materials such as
    > the paper he is citing).

    Well, the Cavnar and Trenkle paper has been around for a long time:
    it's a trivial algorithm to implement, and has served as the
    foundation for many of the open sourced or freely available
    language/encoding ID systems that are out there. Most notably is van
    Noord's Perl "TextCat" program, which has profiles for 77
    language/encoding pairs:

    Indeed, all of the data van Noord uses is included in his distribution.

    The copyright issue is a real one, and he'll need to be careful if he
    decides to re-release te data.


