Re: encoding sniffing

From: Patrick Andries (Patrick.Andries@xcential.com)
Date: Mon Jul 14 2003 - 17:42:06 EDT

  • Next message: Peter Kirk: "Re: [Private Use Area] Audio Description, Subtitle, Signing"

    ----- Message d'origine -----
    De: "Philippe Verdy" <verdy_p@wanadoo.fr>

    > On Monday, July 14, 2003 10:14 PM, Peter_Constable@sil.org
    <Peter_Constable@sil.org> wrote:
    >
    > > Are there any libraries out there (open-source or otherwise) that can
    > > be used to detect the character encoding of a file or data stream?
    >
    > Yes, but these libraries actually try to detect the actual encoded
    > language, based on strict validity rules to discriminate first the
    > possible encodings, then statistic rules to try matching the
    > languages with their various encoded byte sequences, then with
    > the help of common words.

    I know one such library (http://quebec.alis.com/castil/essai_silc.cgi) and
    it does not use a three-step approach as you outline it above, but a single
    one.

    In any case, I believe Peter has an idea how these libraries work and their
    limitations, he is rather looking for one with its limitations.

    P. Andries
    - o - 0 - o -
    Textes Unicode en français
    http://pages.infinit.net/hapax



    This archive was generated by hypermail 2.1.5 : Mon Jul 14 2003 - 18:22:29 EDT