From: Patrick Andries (Patrick.Andries@xcential.com)
Date: Mon Jul 14 2003 - 18:51:41 EDT
----- Message d'origine -----
De: "Philippe Verdy" <email@example.com>
> On Monday, July 14, 2003 11:42 PM, Patrick Andries
> > In any case, I believe Peter has an idea how these libraries work and
> > their limitations, he is rather looking for one with its limitations.
> Including the Chinese limitations? It will become tricky if managing with
traditional or scientific texts with many rare ideographs, because it's
difficult to create an exhaustive morphological analysis with Chinese,
This product does no morphological analysis but uses a hidden Markov Model.
Did you try it ? (I just checked http://www.gov.tw/sars/ with
http://quebec.alis.com/castil/essai_silc.cgi gave me Chinese, Big-5).
Obviously the model is stochastic, but it can be fine-tuned by supplying a
larger (domain specific if needed) tagged corpus.
An improved version is used by Netscape (at least this was my impression
when I left Alis).
- o - 0 - o -
Textes Unicode en franšais
This archive was generated by hypermail 2.1.5 : Mon Jul 14 2003 - 19:28:55 EDT