Re: UCS-2/4 & BOM

From: Erik van der Poel (erik@vanderpoel.org)
Date: Thu Jun 02 2005 - 19:47:25 CDT

  • Next message: Mike Hao: "XML attribute normalization and Unicode in C language"

    Markus Scherer wrote:
    > The IANA character sets list
    > (http://www.iana.org/assignments/character-sets) says:
    >
    > <quote>
    > Name: ISO-10646-UCS-2
    > MIBenum: 1000
    > Source: the 2-octet Basic Multilingual Plane, aka Unicode
    > this needs to specify network byte order: the standard
    > does not specify (it is a 16-bit integer space)
    > Alias: csUnicode
    > </quote>
    >
    > I interpret this to mean that these are CEFs, not CESs or charsets.

    The term "network byte order" is often used in network protocol
    communities, and it means big-endian (see e.g. RFC 951 section 3). So,
    another interpretation of "this needs to specify network byte order" is
    that this charset registration entry still needs to be amended to
    properly specify that "they" have big-endian in mind. I personally think
    that this is more likely to be the intended interpretation, though I
    wouldn't argue with anyone saying that the wording is unclear.

    I'm Cc-ing the ietf-charsets list, in the hope that this entry might be
    clarified (along with the UCS-4 entry).

    Erik van der Poel



    This archive was generated by hypermail 2.1.5 : Thu Jun 02 2005 - 19:50:00 CDT