From: Peter Kirk
Date: Wed Jan 14 2004

    On 14/01/2004 09:25, Mark Davis wrote:

    >I'm not sure which "one suggested heuristic method" you are referring to, ...
    Basically the one that in UTF-16 there are likely to be many zero bytes
    in either odd or even positions.

    >... but
    >you are bounding to conclusions. For example, one of the heuristics is to judge
    >what are more common characters when bytes are interpreted as if they were in
    >different encoding schemes. When picking between UTF16-BE and LE, U+0020 is
    >*still* much more common than U+2000, even in Thai.
    Not necessarily. In certain texts neither might occur at all, so the
    heuristic fails.

    I agree with Mark S and others that more sophisticated methods are
    likely to be safer.

