From: Doug Ewell (email@example.com)
Date: Mon Feb 18 2008 - 13:17:10 CST
Srikrishna Erra wrote:
> For UTF16 encoding scheme, BOM specifies that how a file should be
> serialized i.e, if BOM=FEFF then this file should use big-endian byte
> serialization (most significant byte first) or if BOM=FFFE then this
> file should use little-endian byte serialization (least significant
> byte first) and the unmarked form (No BOM) uses big-endian byte
> serialization by default.
> So LE & BE input files are supposed to be processed on LE & BE
> platforms respectively. when worng endianess input scripts are given
> i.e LE script on BE platform and vice versa, application should
> terminate with an error.
> Here one of my application is allowing BE scripts on LE platforms and
> vice versa. so i need clarification.
Conformance clause C11 directs you to definition D97, which doesn't
really shed any additional light on this, but definiton D98 directly
below it says:
"The UTF-16 encoding scheme may or may not begin with a BOM. However,
when there is no BOM, and in the absence of a higher-level protocol, the
byte order of the UTF-16 encoding scheme is big-endian."
I think your question is governed by that "higher-level protocol"
clause. It depends on what "input scripts" means in your context (plain
text files?), but if you have an application that reads files in the FOO
file format, or input streams using the FOO protocol, which specifies
that UTF-16 text is little-endian and has no BOM, then it is OK and
expected to read this data correctly even on a big-endian architecture.
If you have a UTF-16 file beginning with a BOM, that is not declared to
be UTF-16LE or UTF-16BE, then D98 says you can use the byte orientation
of the BOM as a clue to interpret the file as big-endian or
little-endian. You can choose to be robust and use this clue to read
both types of files, instead of rejecting the one that doesn't match
-- Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Mon Feb 18 2008 - 13:19:46 CST