RE: Identifying file encoding scheme

From: Krebs, Mike (MKrebs@bofasecurities.com)
Date: Mon Sep 13 1999 - 20:51:05 EDT


Erland,

Did you try it in NT? This program won't work on any other windows
variation. This is because NT has Unicode support more widely implemented in
the underlying API than the other windows versions (95, 98, etc.).
Specifically, the problem (I have discovered with the much appreciated
assistance of Markus Scherer at IBM) is with the API call IsTextUnicode().
This function supports the ability to use an unspecified statistical model
to determine the encoding of the text contained in a buffer. The
documentation states that you may specify one or more built-in tests that
the function should use when returning an answer to the question, "Is The
Text Unicode?". The remarks at the end of the documentation for this
function seem to address my problem:

Remarks
As noted in the preceding table of flag constants, the
IS_TEXT_UNICODE_STATISTICS and IS_TEXT_UNICODE_REVERSE_STATISTICS tests use
statistical analysis. These tests are not foolproof. The statistical tests
assume certain amounts of variation between low and high bytes in a string,
and some ASCII strings can slip through. For example, if lpBuffer points to
the ASCII string 0x41, 0x0A, 0x0D, 0x1D (A\n\r^Z), the string passes the
IS_TEXT_UNICODE_STATISTICS test, though failure would be preferable.

This function is only implemented under Windows NT, and is presumably used
by any program that would like to support Unicode, such as notepad and (more
importantly) BCP.

If you *are* using NT 4.0 and you still can't get my example to work, I
would be interested to talk to you on the side to see if there is a switch
you are setting to reduce your Unicode support.

Thanks,

Michael Krebs
Bank of America Securities

> -----Original Message-----
> From: sommar@algonet.se [SMTP:sommar@algonet.se]
> Sent: Monday, September 13, 1999 5:18 PM
> To: Unicode List
> Subject: RE: Identifying file encoding scheme
>
> > From unicode@unicode.org Fri Sep 10 00:35:32 1999
> Montgomery Securities wrote:
> > 3) For perl programmers, this program will generate a file that will
> > confuse NT:
> >
> > unless (open(OUTFILE, ">c:\\confused.txt")) {die("cannot open
> file.\n");}
> > $c1 = "A";
> > $c2 = "B";
> > printf OUTFILE $c1 . ((($c1 x 3) . $c2) x 100) . "\n" . $c1 . ((($c1 x
> 3) .
> > $c2) x 100) . "\n";
>
> I tried your script, and I typed the file in the DOS box. I opened
> the file in Notepad. I bulk copied it to SQL Server. And absolutely
> nothing strange happened.
>
> --
> Erland Sommarskog, Stockholm, sommar@algonet.se



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT