RE: Odd "Unicode" Charset

From: Jonathan Rosenne <>
Date: Sat, 16 Nov 2013 19:53:42 +0200

"Does anyone know whether charset="unicode" is at all normal these days?"

Without specific reference to the specific character set, you would be
surprised at the quantity of material out there encoded in any number of
encodings and character sets. It is not productive to cease supporting any
charset allowed by the HTML standards, especially as handling most of them
is not that expensive.

Best regards,
Jonathan (Jony) Rosenne

-----Original Message-----
From: [] On
Behalf Of Steffen Daode Nurpmeso
Sent: Saturday, November 16, 2013 7:33 PM
To: Tom Gewecke
Cc: Unicode Discussion
Subject: Re: Odd "Unicode" Charset

Tom Gewecke <> wrote:
 |which I think indicates that utf-16 is the correct interpretation. \

I read this as UTF-16BE:

  This character set is encoded as sequences of octets, two per
  16-bit character, with the most significant octet first. Text
  with an odd number of octets is ill-formed.

  Rationale. ISO/IEC 10646-1:1993(E) specifies that when
  characters in the UCS-2 form are serialized as octets, that the
  most significant octet appear first.

 |Does anyone know whether charset="unicode" is at all normal these days?

If you ask me -- at the minimum over the wire this is and ever was a
terroristic charset. Just my one cent.

Received on Sat Nov 16 2013 - 11:55:23 CST

This archive was generated by hypermail 2.2.0 : Sat Nov 16 2013 - 11:55:23 CST