Re: Odd "Unicode" Charset

From: Leif Halvard Silli <>
Date: Sat, 16 Nov 2013 18:20:05 +0100

Sounds like “Bush hid the facts”:

Per the charset decoding algorithm of HTML5, the charset label
'unicode' ought to be interpreted as synonymous with 'UTF-16".

The baffling thing, per the same algorithm, is that if the HTML parser
sees the label "UTF-16" before it has picked the encoding, then it
ought to switch to UTF-8. (This is because, if the content is UTF-16,
the encoding will have been chosen before the parsers detects the
charset label.) May be OSX Mail doesn't implement that.


Tom Gewecke, Sat, 16 Nov 2013 09:18:04 -0700:
> Recently when troubleshooting an email problem for a Mac user, I came
> across an email with Content-Type charset="unicode". I had not seen
> this before. OS X Mail was reading it as Chinese text instead of
> Latin.
> I did find something like this on the IANA list and understand there
> is an RFC from 1994 that provides info about it:
> which I think indicates that utf-16 is the correct interpretation.
> However Mail seems to get the bytes backwards, so 0061 a gets read as
> 6100 愀.
> Does anyone know whether charset="unicode" is at all normal these days?
Received on Sat Nov 16 2013 - 11:22:29 CST

This archive was generated by hypermail 2.2.0 : Sat Nov 16 2013 - 11:22:30 CST