From: John Cowan (jcowan@reutershealth.com)
Date: Sat Nov 02 2002 - 21:13:31 EST
Tex Texin scripsit:
> Interestingly, although I didn't study it in detail, looking at rfc 2376
> for prioritization over charset conflicts, it seems to recommend
> stripping the BOM when converting from utf-16 to other charsets (and
> without considering that ucs-4 would like to keep it). (section 5).
The point is not to try to convert it into an FFEF character or some
replacement thereof, like say "?".
> Also, in considering charset conflicts, 2376 fails to consider conflicts
> between signature and the encoding declaration. (I have a utf-16BE BOM
> and the encoding declaration is for utf-8...).
The encoding declaration is supposed to trump all. So it is UTF-8, and
since 0xFF is illegal in UTF-8, you blow chunks...
> I'll have to check for a more up-to-date rfc.
There is none.
-- John Cowan <jcowan@reutershealth.com> http://www.reutershealth.com I amar prestar aen, han mathon ne nen, http://www.ccil.org/~cowan han mathon ne chae, a han noston ne 'wilith. --Galadriel, _LOTR:FOTR_
This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 21:42:44 EST