From: Patrick Andries (Patrick.Andries@xcential.com)
Date: Mon Jul 14 2003 - 17:42:06 EDT
----- Message d'origine -----
De: "Philippe Verdy" <verdy_p@wanadoo.fr>
> On Monday, July 14, 2003 10:14 PM, Peter_Constable@sil.org
<Peter_Constable@sil.org> wrote:
>
> > Are there any libraries out there (open-source or otherwise) that can
> > be used to detect the character encoding of a file or data stream?
>
> Yes, but these libraries actually try to detect the actual encoded
> language, based on strict validity rules to discriminate first the
> possible encodings, then statistic rules to try matching the
> languages with their various encoded byte sequences, then with
> the help of common words.
I know one such library (http://quebec.alis.com/castil/essai_silc.cgi) and
it does not use a three-step approach as you outline it above, but a single
one.
In any case, I believe Peter has an idea how these libraries work and their
limitations, he is rather looking for one with its limitations.
P. Andries
- o - 0 - o -
Textes Unicode en français
http://pages.infinit.net/hapax
This archive was generated by hypermail 2.1.5 : Mon Jul 14 2003 - 18:22:29 EDT