Re: Conformance

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Fri Jan 21 2005 - 16:58:16 CST

Next message: Rick McGowan: "Re: Subject: Re: 32'nd bit & UTF-8"

Previous message: Marcin 'Qrczak' Kowalczyk: "Re: Subject: Re: 32'nd bit & UTF-8"
In reply to: Richard T. Gillam: "RE: Conformance (was UTF, BOM, etc)"
Next in thread: Rick McGowan: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

"Richard T. Gillam" <rgillam@las-inc.com> writes:

> For that matter, applications that use the full panoply of
> signature-byte sequences (0000FEFF for UTF-32BE, FFFE0000 to UTF-32LC,
> FEFF for UTF-16BE, FFFE for UTF-16LE, EF BB BF for UTF-8, etc.) to
> determine whether a byte stream is Unicode and what Unicode encoding
> scheme it is are also implementing a higher-level protocol based on
> Unicode.

Strictly speaking they can't reliably distinguish UTF-32LE from UTF-16LE.

In practice U+0000 as the first character after the marker is rare,
so perhaps the problem can be ignored...

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

Next message: Rick McGowan: "Re: Subject: Re: 32'nd bit & UTF-8"
Previous message: Marcin 'Qrczak' Kowalczyk: "Re: Subject: Re: 32'nd bit & UTF-8"
In reply to: Richard T. Gillam: "RE: Conformance (was UTF, BOM, etc)"
Next in thread: Rick McGowan: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 17:02:01 CST