From: jarkko.hietaniemi@nokia.com
Date: Thu Aug 25 2005 - 03:33:23 CDT
 
> I tried his demo page just with French, and the conclusion 
> are not good.
> - starting by "essai", it replied finnish
> - extending it to "un essai", it replied romanian
> - extending it to "un essai long", "un essai plus long", or 
> "un essai encore 
> plus long", it replied "rumantsh"
> - extending it to "ceci est un essai long", "ceci est un 
> essai trop long", 
> "ceci est un essai encore trop long", "ceci est un essai 
> suffisant", it 
> replied again romanian...
I think you are being much too harsh in your judgment, it would do well to sit
down and think for a moment what does it do, based on what input, and what does
it output.  Instead, you could have some fun, and see what it does.
a         irish
au        welsh
auk       malay
auke      german
aukea     basque
aukeam    malay
aukeama   swahili
aukeamaa  sanskrit
aukeamaan finnish
(The 'aukeamaan' being a valid Finnish word.)  My main point being, I guess, that take
a look at the replies: 'a' is a valid word in MANY languages - but it replies only with
one.  Ditto for 'au' and 'auk', and 'auke'.  'aukea', 'aukeama', and 'aukeamaa' are valid
Finnish words, but apparently they could be Basque, Malay, and Swahili.
I believe a relatively simple exercise in statistics, playing with the typical n-gram frequencies,
shows that you need to have dozens of letters to get any reasonably reliable results.
> 
This archive was generated by hypermail 2.1.5 : Thu Aug 25 2005 - 03:36:23 CDT