RE: Not out of the woods yet (was RE: UCD in XML)

From: Mark Davis (mark.davis@us.ibm.com)
Date: Fri May 11 2001 - 21:09:00 EDT


All my fault?

IE consistently garbles the XML in its error messages (it was garbled on
previous versions as well). Look at the XML line itself, which I had
included below the indented error message, as in my previous messages. I
reproduce it again, for your reading pleasure.

<e c='??' n='MATHEMATICAL BOLD DIGIT ONE'
 gc='Nd' dt='font' dm='1' nt='decimal' nv='1.0' lb='NU' bc='EN'
Other_Math='T' Hex_Digit='T'
/>

No "illegal values in attributes: space, &, ' and <".

Mark
___
Mark Davis, IBM GCoC, Cupertino
(408) 777-5850 [fax: 5892], mark.davis@us.ibm.com, president@unicode.org
http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=10275+N.+De+Anza&csz=95014

"Michel Suignard" <michelsu@microsoft.com> on 05-11-2001 17:51:38

To: Mark Davis/Cupertino/IBM@IBMUS
cc: <unicode@unicode.org>, <unicore@unicode.org>
Subject: RE: Not out of the woods yet (was RE: UCD in XML)

Mark, this time it is all your fault. You are using illegal values in
attributes: space, &, ' and <. You have to use entities or NCRs for
those in attribute values, don't you? The parser is rightly rejecting
your file.

Note that you have to look at the line number, not the display to figure
out where the problem is as the display is asynchronous from the
parsing. Your problem were for characters 20000, E0000, E0026, etc...

Michel
(OK I hope it is the last of these messages, have a good week-end)

-----Original Message-----
From: Mark Davis [mailto:mark.davis@us.ibm.com]
Sent: Fri, May 11, 2001 5:15 PM
To: Michel Suignard
Cc: unicode@unicode.org; unicore@unicode.org
Subject: Not out of the woods yet (was RE: UCD in XML)

I tried that, and it now gets further. However, it blows up on a pure
UTF-8
version (without the NCRs). I posted that version also on my site so you
can pick it up.

Error message is:

<e c="?" n="MATHEMATICAL BOLD DIGIT ONE" gc

     The XML page cannot be displayed

     Cannot view XML input using XSL style sheet. Please correct the
error
     and then click the Refresh button, or try again later.

     An invalid character was found in text content. Line 37847,
Position 7
     <e c='

     ="m">="Nd" dt="font" dm="1" nt="decimal" nv="1.0" lb="NU" bc="EN"
     Other_Math="T" Hex_Digit="T" />

     The problem appears to be on:

<e c='??' n='MATHEMATICAL BOLD DIGIT ONE'
 gc='Nd' dt='font' dm='1' nt='decimal' nv='1.0' lb='NU' bc='EN'
Other_Math='T' Hex_Digit='T'
/>

The code is 1D7CF, which is in UTF-8 (according to my converter) F0 9D
9F
8F. Don't know why that particular value would tickle something.

Mark
___
Mark Davis, IBM GCoC, Cupertino
(408) 777-5850 [fax: 5892], mark.davis@us.ibm.com, president@unicode.org
http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=10275+N.+De+Anza&csz=9501
4



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT