From: Tex Texin (tex@i18nguy.com)
Date: Sat Nov 02 2002 - 21:04:27 EST
John Cowan wrote:
>
> Tex Texin scripsit:
>
> > So when the parser gets JOECODE, I can understand ignoring the signature
> > and autodetection, but exactly how does it find the first "<"?
>
> Well, if it begins with an 00 byte, it can't be UTF-8 or UTF-16 (it might
> be UTF-32 big-endian, but we'll suppose the parser can't handle that).
> JOECODE is what's left. At worst it is in some other encoding and/or
> not well-formed, in which case you expect an error and you get one.
> Of course the processor knows that "<" is encoded as 0xFF in JOECODE....
>
> The point is that signatures don't decode to a character: processors in
> general, not just XML processors, are expected to skip them.
>
> > It must have to try all of the encodings known to it... ugh.
>
> In such a bad case, that's all you can do.
John,
The bad case is what I was whinging about, since more processors deal
with more than 3 encodings. Ultimately, because the initial characters
are fixed, autodetection is not as bad as it is for plaintext, I realize
that.
Interestingly, although I didn't study it in detail, looking at rfc 2376
for prioritization over charset conflicts, it seems to recommend
stripping the BOM when converting from utf-16 to other charsets (and
without considering that ucs-4 would like to keep it). (section 5).
Also, in considering charset conflicts, 2376 fails to consider conflicts
between signature and the encoding declaration. (I have a utf-16BE BOM
and the encoding declaration is for utf-8...).
I'll have to check for a more up-to-date rfc.
All in all I agree with you and Michka (yes you were right, I was wrong
Michael!) that it isn't that big a deal to support a variety of BOMs but
the world did not need yet another way to sometimes (maybe its there),
almost (maybe its unique), redundantly (one hopes its redundant and not
conflicting) declare an encoding.
tex
-- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 21:42:37 EST