From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Wed Sep 28 2005 - 03:56:56 CST
Dan Masarick wrote:
> Does anyone know a way to determine the method used to
> encode a M.S Word document? I believe that 2000, XP
> and NT Operating systems use UTF16, but I want I am
> seeking confirmation.
I believe the Word format allows for various encodings, including in the
same document. So depending on the way you used to enter the text into
(including the version used, the language of the UI, the host system, its
system language, and whether you pasted, you typed or you used
InsertSpecialCharacter or VisualBasic or anything else), it can end either
in UTF-16 or in some 8-bit based (including MBCS) character set (I believe
current versions of Word do not encode characters using Asian character sets
as they did years ago, since it is difficult to read it if you do not have
the proper fonts and decoding tables at hand.)
If you have a look at the specification for the RTF format (that were
available publicly, so they should still be available on the web), you will
see something similar, with gory details.
Antoine
This archive was generated by hypermail 2.1.5 : Wed Sep 28 2005 - 03:58:53 CST