Re: Cp1256 (Windows Arabic) Characters not supported by UTF8

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Aug 11 2005 - 09:00:37 CDT

  • Next message: Doug Ewell: "Re: Cp1256 (Windows Arabic) Characters not supported by UTF8"

    Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

    >> This is not a bad heuristic in general, but I don't think I'd suggest
    >> using "a" as an indication that the text is in French. That word has
    >> a tendency to occur in English now and then.
    >
    > I know, but it counts positively to French and English (probably more
    > in English than in French were it is just a common conjugated form of
    > an essential auxiliary verb). The idea is not to count single words,
    > but to compute a summary statistic for lists of candidate languages,
    > using list of words rated by occurence probability. Such a list of
    > words will be much larger than the few examples I gave, and will
    > include other common words and contractions.

    That makes a lot more sense. Thank you for the clarification.

    --
    Doug Ewell
    Fullerton, California
    http://users.adelphia.net/~dewell/
    


    This archive was generated by hypermail 2.1.5 : Thu Aug 11 2005 - 09:02:20 CDT