Re: Odd "Unicode" Charset

From: Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>
Date: Sat, 16 Nov 2013 18:20:05 +0100

Sounds like “Bush hid the facts”:
http://en.wikipedia.org/wiki/Bush_hid_the_facts

Per the charset decoding algorithm of HTML5, the charset label
'unicode' ought to be interpreted as synonymous with 'UTF-16".

The baffling thing, per the same algorithm, is that if the HTML parser
sees the label "UTF-16" before it has picked the encoding, then it
ought to switch to UTF-8. (This is because, if the content is UTF-16,
the encoding will have been chosen before the parsers detects the
charset label.) May be OSX Mail doesn't implement that.

Leif

Tom Gewecke, Sat, 16 Nov 2013 09:18:04 -0700:
> Recently when troubleshooting an email problem for a Mac user, I came
> across an email with Content-Type charset="unicode". I had not seen
> this before. OS X Mail was reading it as Chinese text instead of
> Latin.
> I did find something like this on the IANA list and understand there
> is an RFC from 1994 that provides info about it:
>
> http://tools.ietf.org/html/rfc1641
>
> which I think indicates that utf-16 is the correct interpretation.
> However Mail seems to get the bytes backwards, so 0061 a gets read as
> 6100 愀.
>
> Does anyone know whether charset="unicode" is at all normal these days?
>
>
>
Received on Sat Nov 16 2013 - 11:22:29 CST

This archive was generated by hypermail 2.2.0 : Sat Nov 16 2013 - 11:22:30 CST