Re: newbie: unicode (when used as a coding) = UTF16LE?

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Feb 13 2003 - 11:24:46 EST

  • Next message: Doug Ewell: "Re: Plane 14 Tag Deprecation Issue"

    John Cowan <cowan at mercury dot ccil dot org> wrote:

    >> On top of that, you may wish to put BOM at teh very beg. of
    >> your UTF-16LE html files although that's not necessary
    >> with the correct C-T http header as above.
    >
    > No, no! In UTF-16LE, if the first two bytes are FF FE, that means an
    > actual ZWNBSP character. (Analogously in UTF-16BE.) The whole point
    > of the charsets "UTF-16LE" and "UTF-16BE" is that there is no BOM.
    > In the charset "UTF-16", however, there may or may not be a BOM.

    Thanks for the correction to my post comparing "UTF-16LE" and "UTF-16".
    I had written that "UTF-16" implies the presence of a BOM. You are
    correct that the BOM may or may not be present. Furthermore, if it is
    not, big-endian is assumed (the source of Weiwu's original question
    about big-endian being "preferred").

    But having said that...

    Suppose Weiwu follows Jungshik's suggestion and inserts the character
    U+FEFF at the beginning of his HTML. And suppose John is right, that
    because the file is tagged "UTF-16LE" the character U+FEFF actually
    represents a ZWNBSP instead of a BOM.

    What harm has been done? It's an Web page, not a data file for which
    absolute byte-for-byte fidelity is required. The ZWNBSP is totally
    invisible to the user viewing the page. It has no behavior -- a ZWNBSP
    is supposed to prevent a break between the preceding and following
    characters, but in this case there is no preceding character, so what is
    the ZWNBSP supposed to do?

    In some future version of Unicode, say 5.0 or above, I'd really like to
    see a resolution to this nonsensical "initial ZWNBSP" case. U+FEFF at
    the beginning of a file or stream (not fragment) could logically only be
    a BOM. We have U+2060 WORD JOINER to handle the ZWNBSP semantic now.
    Talk about something that needs to be deprecated.

    -Doug Ewell
     Fullerton, California



    This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 12:31:20 EST