Re: How to distinguish UTF-8 from Latin-* ?

From: Doug Ewell (dewell@compuserve.com)
Date: Wed Jun 21 2000 - 00:27:36 EDT


Kenneth Whistler <kenw@sybase.com> wrote:

> But if I invented a hoity-toity company name with extra accents for
> "class", such as, L·DÏ·DÀ® Productions, Inc. and sent this to you in
> ISO 8859-1, as I am currently doing, your sanity check will fail in
> this case and identify this file as UTF-8, with 3 characters
> misinterpreted.

Still, you have to admit this is an extremely contrived case. Either
Bob's or Tim's heuristic should work most of the time, and for some
applications, "most of the time" is good enough. The user should, of
course, be allowed to override the program's decision manually.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT