From: Markus Scherer (markus.icu@gmail.com)
Date: Thu Jun 02 2005 - 16:24:23 CDT
The IANA character sets list
(http://www.iana.org/assignments/character-sets) says:
<quote>
Name: ISO-10646-UCS-2
MIBenum: 1000
Source: the 2-octet Basic Multilingual Plane, aka Unicode
this needs to specify network byte order: the standard
does not specify (it is a 16-bit integer space)
Alias: csUnicode
Name: ISO-10646-UCS-4
MIBenum: 1001
Source: the full code space. (same comment about byte order,
these are 31-bit numbers.
Alias: csUCS4
</quote>
I interpret this to mean that these are CEFs, not CESs or charsets.
They would not be the only items in the charsets list that are not
charsets.
In practice, if you do see them specified, you might want to check if
the sender is sending what looks like a BOM. In other words, it may be
best to reinterpret them as "UTF-16" and "UTF-32" charsets.
Or, reject the text with an error. It's the sender's fault to use
these names :-)
On 6/2/05, Theo Veenker <Theo.Veenker@let.uu.nl> wrote:
> If someone sends me a text file marked charset=ISO-10646-UCS-2
> or charset=ISO-10646-UCS-4, should an initial BOM in this file have
> the same meaning as a BOM in UTF-16/32?
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Thu Jun 02 2005 - 16:25:31 CDT