Re: Encoding Method UTF8 or UTF16

From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Wed Sep 28 2005 - 03:56:56 CST

  • Next message: Magda Danish \(Unicode\): "Call for Participation: 29th Internationalization & Unicode Conference"

    Dan Masarick wrote:
    > Does anyone know a way to determine the method used to
    > encode a M.S Word document? I believe that 2000, XP
    > and NT Operating systems use UTF16, but I want I am
    > seeking confirmation.

    I believe the Word format allows for various encodings, including in the
    same document. So depending on the way you used to enter the text into
    (including the version used, the language of the UI, the host system, its
    system language, and whether you pasted, you typed or you used
    InsertSpecialCharacter or VisualBasic or anything else), it can end either
    in UTF-16 or in some 8-bit based (including MBCS) character set (I believe
    current versions of Word do not encode characters using Asian character sets
    as they did years ago, since it is difficult to read it if you do not have
    the proper fonts and decoding tables at hand.)

    If you have a look at the specification for the RTF format (that were
    available publicly, so they should still be available on the web), you will
    see something similar, with gory details.

    Antoine



    This archive was generated by hypermail 2.1.5 : Wed Sep 28 2005 - 03:58:53 CST