Re: XML Parser for Unicode Big Indian font MSWord document

From: Markus Scherer (
Date: Mon Jan 19 2004 - 14:06:07 EST

  • Next message: Dean Snyder: "Re: Cuneiform Free Variation Selectors"

    N. Ganesh Babu wrote:
    > I having XML file in Unicode-Big Indian font created in MS Word. Please

    I believe you mean that you have chosen to save a document in the "Unicode Big Endian" encoding
    scheme, formally known as UTF-16BE. An encoding is different from a font.

    > let me know whether we can parse the XML file as it is with the MS Word?
    > If yes please let me know the parser name.

    Every XML parser that conforms to XML 1.0 must be able to handle UTF-8 and UTF-16. The latter is
    best supported if it includes a Byte Order Mark in the document. I believe that Word includes the
    BOM when you save as "Unicode" or "Unicode Big-Endian".

    Java 1.4 contains an XML parser.
    The Apache project provides the Xerces parser.
    There are many others.

    Spelling tip: big-endian, not "indian". From "end".

    Encoding etc.:

    I hope this helps,

    Opinions expressed here may not reflect my company's positions unless otherwise noted.

    This archive was generated by hypermail 2.1.5 : Mon Jan 19 2004 - 14:45:03 EST