Re: XML attribute normalization and Unicode in C language

From: Philippe Verdy (
Date: Fri Jun 03 2005 - 15:15:28 CDT

  • Next message: Michael Everson: "Woleai (Caroline Islands script)"

    Why not using a XML parser to do this job?

    Using Xerces with the SAX interface to enumerate the various items will
    allow you to support lots of encodings (including UTF-8 and UTF-16), then in
    the callback that receives the parsed and isolated string items, you can use
    a normalization function to transform them, and then generate the new XML
    document on the fly.

    It's really not complicate to do with the Xerces+ICU pair, and an example of
    a simple transformation of a XML document.

    You could use a DOM-based API as well (but DOM requires parsing the whole
    document before you can browse the elements and attributes tree to generate
    a new document; one interest if that DOM naturally "normalizes" the values
    of attributes and their relative order, in addition to resolving the various
    entities, allowing you for example to normalize and unify the namespaces as
    well if you want to build a coherent set of XML files using the same set of
    namespace prefixes).

    ----- Original Message -----
    From: "Mike Hao" <>
    To: <>
    Sent: Friday, June 03, 2005 6:41 AM
    Subject: XML attribute normalization and Unicode in C language

    > Hi All,
    > I am not sure if this is the right group to post my
    > question. Hope I can get some help or hint from you.
    > I am working on a project, which need to normalize XML
    > attribute values using C programming language. I need
    > to support UTF-8 and UTF-16 encodings. Currently I can
    > not think of a good solution to it. Does anyone have
    > such a experience to share with me? Or could you tell
    > me what's the right way to do it?

    This archive was generated by hypermail 2.1.5 : Fri Jun 03 2005 - 15:16:18 CDT