Re: compatibility characters (in XML context)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Nov 14 2003 - 11:40:05 EST

  • Next message: Philippe Verdy: "Re: Definitions"

    From: "Kent Karlsson" <kentk@cs.chalmers.se>
    > Philippe Verdy wrote:
    > > (1) a singleton (example the Angström symbol, canonically
    > > mapped to A with diaeresis,
    > The Ångström (note spelling) sign is canonically mapped to
    > capital a with ring.

    Beside the speeling (is it wrong to omit the ring in English?) that I
    don't have on my keyboard. I should have reread myself. Of course
    I meant ring and not diaeresis (above o). Sorry that's a typo.

    > There are several meanings of "compatibility characters".
    >
    > The most important here are the characters that have a
    > compatibility decomposition mapping. For details,
    > see UTR 20: http://www.unicode.org/reports/tr20/.

    Yes but these ones are NOT excluded from XML processing, which
    should work also with characters having a compatibility decomposition
    without affecting their supplementary meaning (wide, narrow, font, etc...)

    > > And the "oe ligature" has only a compatiblity decomposition,
    > > and then is not a compatibility character.
    >
    > The oe ligature characters have no decomposition at all.

    I thought if had (it is used in French where it is clearly a typographic
    ligature buf handled and sorted like two letters), as opposed to the ae
    ligature (which is typographic ligature in French, but a true letter in
    other
    languages).

    > > > Is somewhere a complete chart of "compatibility characters" ?
    > >
    > > Look at the Unicode data file which lists composition exclusions...
    >
    > Which is unrelated to the question posed! See UTR 20 instead.

    I don't think that was the question... UTR20 is efectively more precise, but
    some actions listed there are discutable (for example "use list item" or
    "use <sub> markup" implies that the XML schema is HTML, but for general XML
    processing HTML is not there... Such actions should have been restricted to
    XHTML, and changed to "retain" in other cases.)

    XML is not made only to represent text with markup, and XML conformance
    requires not performing unsafe actions without knowledge of the context in
    which the text is used. That's why the W3C recommands only the NFC form, and
    not the NFKC form...

    So as the UTR 20 is informative, and XML conformance is normative, I would
    definitely not use UTR 20 which could break XML applications...

    For me, the title of this UTR is wrong and should apply only to markup
    languages based on XML (including XHTML), but not XML as a whole (and this
    applies also to BiDi override controls, as there's no such "dir" attribute
    name in the core XML schema !)



    This archive was generated by hypermail 2.1.5 : Fri Nov 14 2003 - 12:12:28 EST