From: Mark E. Shoulson (firstname.lastname@example.org)
Date: Wed Jan 14 2004 - 00:05:32 EST
On 01/13/04 05:40, Marco Cimarosti wrote:
>Peter Kirk wrote:
>>This one also looks dangerous.
>What do you mean by "dangerous"? This is an heuristic algorithm, so it is
>only supposed to work always but only in some lucky cases.
>If lucky cases average to, say, 20% or less then it is a bad and useless
>algorithm; if they average to, say, 80% or more, then it is good and
>useless. But you can't ask that it works in the 100% of cases, or it
>wouldn't be heuristic anymore.
If it's a heuristic we're after, then why split hairs and try to make
all the rules ourselves? Get a big ol' mess of training data in as many
languages as you can and hand it over to a class full of CS graduate
students studying Machine Learning. Throw it at some neural networks,
go Bayesian with digraphs, whatever. Analyzing multigraph frequency
(say, strings of up to four characters) would probably do a pretty
decent job just by itself.
This archive was generated by hypermail 2.1.5 : Wed Jan 14 2004 - 00:51:12 EST