Re: Nicest UTF

From: John Cowan (jcowan@reutershealth.com)
Date: Fri Dec 10 2004 - 19:26:22 CST

  • Next message: John Cowan: "Re: Nicest UTF"

    Marcin 'Qrczak' Kowalczyk scripsit:

    > http://www.w3.org/TR/2000/REC-xml-20001006#charsets
    > implies that the appropriate level for parsing XML is code points.

    You are reading the XML Recommendation incorrectly. It is not defined
    in terms of codepoints (8-bit, 16-bit, or 32-bit) but in terms of
    characters. XML processors are required to process UTF-8 and UTF-16,
    and may process other character encodings or not. But the internal
    model is that of characters. Thus surrogate code points are not
    allowed.

    -- 
    John Cowan  www.reutershealth.com  www.ccil.org/~cowan  jcowan@reutershealth.com
    Arise, you prisoners of Windows / Arise, you slaves of Redmond, Wash,
    The day and hour soon are coming / When all the IT folks say "Gosh!"
    It isn't from a clever lawsuit / That Windowsland will finally fall,
    But thousands writing open source code / Like mice who nibble through a wall.
            --The Linux-nationale by Greg Baker
    


    This archive was generated by hypermail 2.1.5 : Fri Dec 10 2004 - 19:27:54 CST