Re: Cp1256 (Windows Arabic) Characters not supported by UTF8

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Aug 11 2005 - 00:18:10 CDT

  • Next message: Philippe Verdy: "Re: Cp1256 (Windows Arabic) Characters not supported by UTF8"

    Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

    > To detect a language, you could also try searching for very common
    > terms like "the", "is", "are", "have", "and" in English, "le", "un",
    > "a", "à", "est", "et" in French, "der", "das", "ist" in German.

    This is not a bad heuristic in general, but I don't think I'd suggest
    using "a" as an indication that the text is in French. That word has a
    tendency to occur in English now and then.

    --
    Doug Ewell
    Fullerton, California
    http://users.adelphia.net/~dewell/
    


    This archive was generated by hypermail 2.1.5 : Thu Aug 11 2005 - 00:20:58 CDT