Identifying file encoding scheme

From: Montgomery Securities (mkrebs@primebroker.com)
Date: Wed Sep 08 1999 - 16:16:10 EDT


I am new to the list, please forgive any simplicity or inappropriateness of
the question; I am having trouble finding information on my problem in any
of the FAQ's I've seen.

How does a software package written on an operating system that supports
ASCII as well as Unicode (Windows NT) identify the encoding scheme that a
text file on disk uses? Is there any special marking at the front of a
Unicode file that helps distinguish it from an 8 bit file?

More specifically, I am reading a flat text file that was created using 8
bit ASCII, with all the characters in the file falling in the 7-bit range.
When I use "notepad" to view the file, I get garbage that is half the
length of the actual file, which leads me to believe it is displaying the
file using 16 bit encoding. "Notepad" is a Unicode aware package, and I
believe it is mistakenly identifying the file as Unicode. Other programs I
have tried: "TYPE" from a shell also produces garbage, "CAT" from a command
shell displays the file correctly, and "Wordpad" displays the file
correctly.

A little background: The file is an exported table from an Informix
database on a SUN running UNIX that I need to import into SQL server on
Windows NT. The utility that imports pipe delimited files from disk into a
table on NT is reading the garbage.

Thanks for your help!

Michael Krebs
mkrebs@montgomery.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT