Re: Nicest UTF

From: John Cowan (
Date: Fri Dec 10 2004 - 21:28:32 CST

  • Next message: Clark Cox: "Re: US-ASCII (was: Re: Invalid UTF-8 sequences)"

    Philippe Verdy scripsit:

    > >Okay, I'm confused. Does ≮ open a tag? Does it matter if it's
    > >composed or decomposed?
    > It does not open a XML tag.
    > It does matter if it's composed (won't open a tag) or decomposed (will
    > open a tag, but with a combining character, invalid as an identifier
    > start)

    Let's be precise here. If the 7-character character sequence "蠔"
    appears in an XML document, it never opens a tag and it is never changed
    by normalization. If the 1-character sequence consisting of a single
    U+226E appears in an XML document, and that document is put through
    NF(K)D, it will become not well-formed. However, NF(K)D is not
    recommended for XML documents, which should be in NFC.

    First known example of political correctness:   John Cowan
    "After Nurhachi had united all the other
    Jurchen tribes under the leadership of the
    Manchus, his successor Abahai (1592-1643)
    issued an order that the name Jurchen should       --S. Robert Ramsey,
    be banned, and from then on, they were all         The Languages of China
    to be called Manchus."

    This archive was generated by hypermail 2.1.5 : Fri Dec 10 2004 - 21:29:10 CST