From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Fri Jan 21 2005 - 16:58:16 CST
"Richard T. Gillam" <rgillam@las-inc.com> writes:
> For that matter, applications that use the full panoply of
> signature-byte sequences (0000FEFF for UTF-32BE, FFFE0000 to UTF-32LC,
> FEFF for UTF-16BE, FFFE for UTF-16LE, EF BB BF for UTF-8, etc.) to
> determine whether a byte stream is Unicode and what Unicode encoding
> scheme it is are also implementing a higher-level protocol based on
> Unicode.
Strictly speaking they can't reliably distinguish UTF-32LE from UTF-16LE.
In practice U+0000 as the first character after the marker is rare,
so perhaps the problem can be ignored...
-- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/
This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 17:02:01 CST