Re: newbie: unicode (when used as a coding) = UTF16LE?

From: Jungshik Shin (jshin@mailaps.org)
Date: Thu Feb 13 2003 - 00:27:34 EST

  • Next message: starner@okstate.edu: "Re: newbie: unicode (when used as a coding) = UTF16LE?"

    On Thu, 13 Feb 2003, Zhang Weiwu wrote:

    > Very newbie question:
    > 1) I noticed when I save a file as "unicode" in Windows 2000, or
    > other editor like EditPlus, the file begins with FF FE, which looks
    > like UTF16LE. Also it seems to me when ContentType in a html page is
    > "unicode", IE tends to understand it as UTF16LE. So it seems UTF16LE is
    > (or was) the standard coding for unicode.

      What Windows or IE does not make anything more standard-compliant
    than it actually is. For Windows and MS IE running on
    intel x86 machines, it may be pretty natural to use UTF-16LE,
    but that does not hold for other architecture/OS combinations.

    > 2) But on the FAQ on unicode.org, it says UTF16BE is the prefered
    > unicode coding.
    >
    > Is it that, when people say "unicode" without UTF, they mean UTF16LE?

      No, UTF-16LE is just one of many Unicode transformation form(at)s.
    Each UTF has its own pros and cons and you have to choose
    whatever is appropriate for your own need.

    > I am going to design a website with unicode. I don't use UTF-8 because
    > most are CJK text thus UTF-8 html would be too fat. I should use UTF16LE,
    > should I?

      Whatever UTF youdecide to use, the only thing you have to take care
    of is to label/mark it in a standard compliant-way. If you want to
    use UTF-16LE, you should make sure that your web server
    emits the correct http header with C-T as following:
    (note that meta tag in the beg. of html files
    don't work well for UTF-16/UTF-32)

    Content-Type: text/html; charset=UTF-16LE

    On top of that, you may wish to put BOM at teh very beg. of
    your UTF-16LE html files although that's not necessary
    with the correct C-T http header as above.

    BTW, you MUST NOT use 'charset=unicode' assuming that it'll be
    interpreted as 'utf-16le'. See http://www.i18nguy.com/unicode
    and http://jshin.net/i18n/utftest

       Jungshik



    This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 01:05:23 EST