RE: UCD in XML

From: Mark Davis (mark.davis@us.ibm.com)
Date: Thu May 10 2001 - 21:51:13 EDT


I had assumed that the parser would be rev'ed when I got IE 5.5 with the
latest patches. Is this always going to be an add-on, or will it be folded
in at some point?

Mark
___
Mark Davis, IBM GCoC, Cupertino
(408) 777-5850 [fax: 5892], mark.davis@us.ibm.com, president@unicode.org
http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=10275+N.+De+Anza&csz=95014

"Michel Suignard" <michelsu@microsoft.com> on 05-10-2001 18:41:25

To: Mark Davis/Cupertino/IBM@IBMUS
cc: <unicode@unicode.org>, <unicore@unicode.org>
Subject: RE: UCD in XML

Mark, I already answered that question a while ago to you. Our current
XML parser (msxml3.dll) parses Unicode 3.1 correctly. In fact I edited
your file to verify this. And indeed on a system with surrogate font
installed like mine it will even display the surrogate characters (part
of that font) correctly.
It is only the previous XML parser (shipped originally with the OS) that
has the problem.
Please go the Microsoft web site http://msdn.microsoft.com/xml/ to get
the version 3 (which is conformant per your definition) or even if you
are brave there is a preview version (version 4) which will read XML
Schema.
MSXML 3.0 has been available now for over a year and can be installed in
a way to be either used separately or through IE (read the info at the
web site).

So no need to comment anything out (except of course the #FFFF you left
in) to be readable by our current XML parser. And the perf is quite
acceptable given the size of the file.

Michel
PS please forward to Unicode as I will probably be blocked.

-----Original Message-----
From: Mark Davis [mailto:mark.davis@us.ibm.com]
Sent: Thu, May 10, 2001 5:58 PM
To: unicore@unicode.org
Cc: unicode@unicode.org
Subject: UCD in XML

Several people asked me over the last month about the XML version of the
Unicode character database that I presented at last November's UTC
meeting.
I posted it at http://www.macchiato.com/utc/UCD.zip, containing two
files:

UCD.xml
UCD-Notes.htm

Caveats

1. I regenerated the data with Unicode 3.1 data. However, (a) I haven't
done more than spot-check the results, and (b) the format differs
somewhat
from what is documented in the notes.

2. I still have to comment out characters FFF9..FFFD, and all
surrogates,
so that people can read the file with Internet Explorer (I do wish they
would use a conformant XML parser). Also, note that IE takes quite a
while
to load the file.

Mark
___
Mark Davis, IBM GCoC, Cupertino
(408) 777-5850 [fax: 5892], mark.davis@us.ibm.com, president@unicode.org
http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=10275+N.+De+Anza&csz=9501
4



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT