Re: Conformance

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Fri Jan 21 2005 - 16:58:16 CST

  • Next message: Rick McGowan: "Re: Subject: Re: 32'nd bit & UTF-8"

    "Richard T. Gillam" <rgillam@las-inc.com> writes:

    > For that matter, applications that use the full panoply of
    > signature-byte sequences (0000FEFF for UTF-32BE, FFFE0000 to UTF-32LC,
    > FEFF for UTF-16BE, FFFE for UTF-16LE, EF BB BF for UTF-8, etc.) to
    > determine whether a byte stream is Unicode and what Unicode encoding
    > scheme it is are also implementing a higher-level protocol based on
    > Unicode.

    Strictly speaking they can't reliably distinguish UTF-32LE from UTF-16LE.

    In practice U+0000 as the first character after the marker is rare,
    so perhaps the problem can be ignored...

    -- 
       __("<         Marcin Kowalczyk
       \__/       qrczak@knm.org.pl
        ^^     http://qrnik.knm.org.pl/~qrczak/
    


    This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 17:02:01 CST