Re: Unicode Technical Report #22

From: Mark Davis (mark.davis@us.ibm.com)
Date: Thu Mar 20 2003 - 14:05:51 EST

  • Next message: Michael Yau: "ANSI requires licence fees to use ISO language and country code?"

    The only problem would come in would be if you were trying to read a CharML
    file that *itself* was encoded using a character set that your XML parser
    didn't know. That's one reason for encoding the CharML files themselves
    always in UTF-8 or ASCII. I'll post this to a broader mailing list, since
    some others may have similar concerns.

    Mark
    ___
    mark.davis@us.ibm.com
    IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
    (408) 256-3148
    fax: (408) 256-0799

                                                                                                                                
                          "Claude Tardif"
                          <intmktg@cam.org> To: Mark Davis/Cupertino/IBM@IBMUS
                                                   cc: <marc@sitepak.com>
                          2003.03.19 21:44 Subject: Unicode Technical Report #22
                                                                                                                                
                                                                                                                                

    Your document referenced in the title of this message specifies an XML
    format for the interchange of mapping data for character encodings.
    Inversely, the Extensible Markup Language (XML) 1.0 (Second Edition)
    section 4.3.3 specifies an entity for changing the character encoding of
    XML formatted documents. If character encoding uses XML and XML uses
    character encoding, there is necessarily an interdependency loop. For
    example, what if a conversion library such as ICU parsed character
    encoding files using an XML parser which itself used ICU to convert
    character encoding in entities? Then, if the XML file defining the
    charset encoding for ISO-8859-1 contained the entity <?xml
    encoding='ISO-8859-1'?>, this would cause a loop as the character
    encoding could never parse itself.

    My question is: Is there a way for a conversion library and XML parser
    to make use of their services mutually without causing such an
    interdependency loop and, preferably, without having such requirements
    as character encoding files not containing character encoding in
    entities?

    Marc Tardif



    This archive was generated by hypermail 2.1.5 : Thu Mar 20 2003 - 15:00:08 EST