Re: Fw: Biblical Hebrew: possible solution for XML

From: John Cowan (
Date: Fri Jun 27 2003 - 13:17:06 EDT

  • Next message: Karljürgen Feuerherm: "Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)"

    Philippe Verdy scripsit:

    > Given that XML will require normalization for texts identified as
    > being Unicode encoded (UTF-8 and others), couldn't a document be
    > labelled so that the normalization step be removed from the XML
    > processing, using a "ISO-10646-8" encoding name (for the UTF-8
    > encoding scheme)?

    No. The W3C rule is "Check normalization on input (parsing), create
    normalization on output (creating or transcoding)", and it applies to
    all encodings, since any character may be expressed in any encoding
    using character references.

    However, normalization checking is still a SHOULD even in XML 1.1, and at
    most a MAY (not actually mentioned at all) in XML 1.0, the current version.

    John Cowan
    "You cannot enter here.  Go back to the abyss prepared for you!  Go back!
    Fall into the nothingness that awaits you and your Master.  Go!" --Gandalf

    This archive was generated by hypermail 2.1.5 : Fri Jun 27 2003 - 14:02:23 EDT