Re: How to distinguish UTF-8 from Latin-* ?

From: Daniel Biddle (
Date: Wed Jun 21 2000 - 02:27:17 EDT

On Tue, 2000-06-20, Doug Ewell wrote:

> Kenneth Whistler <> wrote:
> > But if I invented a hoity-toity company name with extra accents for
> > "class", such as, L┬ĚD¤ĚD└« Productions, Inc. and sent this to you in
> > ISO 8859-1, as I am currently doing, your sanity check will fail in
> > this case and identify this file as UTF-8, with 3 characters
> > misinterpreted.
> Still, you have to admit this is an extremely contrived case.

A much less contrived case, suggested by someone on this list a while ago:
it's easy to find Web pages containing "NESCAF╔" or "NESTL╔" in upper-case
letters, so there's a good chance that plain text files exist right now
containing "NESCAFɮ" or "NESTLɮ"; both of these strings could be
misinterpreted as containing a valid UTF-8 byte sequence.

Daniel Biddle <>

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT