Fw: Biblical Hebrew: possible solution for XML

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Jun 27 2003 - 12:50:23 EDT

  • Next message: John Cowan: "Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)"

    On Friday, June 27, 2003 6:01 PM, Philippe Verdy <verdy_p@wanadoo.fr>

    Given that XML will require normalization for texts identified as
    being Unicode encoded (UTF-8 and others), couldn't a document be
    labelled so that the normalization step be removed from the XML
    processing, using a "ISO-10646-8" encoding name (for the UTF-8
    encoding scheme)?

    In that case, this would assume that the whole document does not
    adopt the Unicode normalization, but still uses the same repertoire...
    (So this would optionally remove a processing step for XML parsers,
    that would just apply the normalization only on input, but not in the
    internal processing, and not even in its output).

    Is it too much tricky for the XML conformance requirements? Who
    must adapt its standard? For me a document can be fully conforming
    to ISO10646 without being conforming to Unicode if it does not want
    to use the /implied/ Unicode properties such as combining classes
    and Unicode normalization forms (and there are certainly other
    interesting normalizations that could be useful for each language)...

    The caveat would be more a more complex font layout engine (with
    larger tables for combining sequences) if texts can be encoded
    without being normalized first...

    -- Philippe.

