Re: newbie: unicode (when used as a coding) = UTF16LE?

From: Doug Ewell (
Date: Thu Feb 13 2003 - 02:18:02 EST

  • Next message: Manoj Jain: "Unicode 4.0 Beta - Glyphs of proposed addition of Characters in Gujarati"

    Zhang Weiwu <weiwuzhang at hotmail dot com> asked:

    > Is it that, when people say "unicode" without UTF, they mean UTF16LE?

    and Jungshik Shin <jshin at mailaps dot org> responded:

    > No, UTF-16LE is just one of many Unicode transformation form(at)s.
    > Each UTF has its own pros and cons and you have to choose
    > whatever is appropriate for your own need.

    but I'm not sure that answered the question Weiwu was really asking.

    It is true that when Windows and other Microsoft products refer to
    "Unicode," without qualification, they usually mean UTF-16
    little-endian. (Note that "UTF-16 little-endian" is not technically the
    same as "UTF-16LE"; the former implies the presence of a BOM while the
    latter implies that none is present.)

    Despite this Microsoft convention, however, it is not true that
    "Unicode" automatically means UTF-16, of any type. This was once the
    case -- as late as TUS 3.0, we were told that "Plain Unicode text
    consists of sequences of 16-bit character codes" (p. 12) -- but it is no
    longer true. UTF-8 and UTF-32 are now on equal footing with UTF-16.

    If you do include a BOM, I don't see any reason you can't send
    little-endian UTF-16 down the line. The "preference" of big-endian
    UTF-16 over little-endian has to do with the assumption to be made when
    no BOM is present. When there is a BOM, no assumptions are necessary;
    software should interpret text as BE or LE depending on the byte
    orientation of the BOM.

    (BTW, I thought Weiwu's so-called "newbie question" was much better
    expressed and demonstrated better understanding of Unicode than many
    non-newbie questions I have seen on this list.)

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 03:06:11 EST