From: Philippe Verdy (email@example.com)
Date: Fri Jun 03 2005 - 15:15:28 CDT
Why not using a XML parser to do this job?
Using Xerces with the SAX interface to enumerate the various items will
allow you to support lots of encodings (including UTF-8 and UTF-16), then in
the callback that receives the parsed and isolated string items, you can use
a normalization function to transform them, and then generate the new XML
document on the fly.
It's really not complicate to do with the Xerces+ICU pair, and an example of
a simple transformation of a XML document.
You could use a DOM-based API as well (but DOM requires parsing the whole
document before you can browse the elements and attributes tree to generate
a new document; one interest if that DOM naturally "normalizes" the values
of attributes and their relative order, in addition to resolving the various
entities, allowing you for example to normalize and unify the namespaces as
well if you want to build a coherent set of XML files using the same set of
----- Original Message -----
From: "Mike Hao" <firstname.lastname@example.org>
Sent: Friday, June 03, 2005 6:41 AM
Subject: XML attribute normalization and Unicode in C language
> Hi All,
> I am not sure if this is the right group to post my
> question. Hope I can get some help or hint from you.
> I am working on a project, which need to normalize XML
> attribute values using C programming language. I need
> to support UTF-8 and UTF-16 encodings. Currently I can
> not think of a good solution to it. Does anyone have
> such a experience to share with me? Or could you tell
> me what's the right way to do it?
This archive was generated by hypermail 2.1.5 : Fri Jun 03 2005 - 15:16:18 CDT