Re: Encoding????????????

From: Brendan Murray/DUB/Lotus (Brendan_Murray@Lotus.com)
Date: Thu Sep 28 2000 - 04:30:10 EDT


"Sandeep Krishna" <sandeepkrishna@noida.hcltech.com> wrote:
> can someone tell me...what does the Encoding in the browser (IE5)
imlpy.....
> does it mean that the Encoding (say UTF-8 or Chinese Big5) shall be used
for encoding/ decoding any data ..(page) to be displayed or sent....
>
> i mean if i use an encoding like Big5 .... how does it encode a chinese
character...similar to utf-8 or differently..???
> and can i display a Korean charactrer... using big5???

Encoding in this situation simply means the codepage: Big5 is the codepage
used in Traditional Chinese (CP 950). For Korean, use KS C 5601 (CP 949).
Both of these are limited in their coverage - Big5 doesn't contain Hangul
while KS C 5601 doesn't encode all the Chinese ideograms.

On the other hand, UTF-8 is a form of Unicode, with all the characters that
can be encoded in Unicode, i.e. the full set defined in Unicode 3.0 as well
as all the others proposed for future revisions. In other words, it's a
universal character set. If you want to display data from multiple
languages, you really have to use UTF-8 or some other Unicode/ISO 10646
encoding.

Each encoding differs, although there are often similarities. For example,
many encodings are based on ASCII, so the lower 128 characters
overlap,while the upper 128 differ; Big5 uses these upper-range characters
as the first of a two-byte character, so the encoding is completely
different from UTF-8.

If you tell IE that the encoding is Big5 when it is really UTF-8, you'll
get a corrupted display. It will be particularly confusing if there are
some ASCII data mixed in, since these will display correctly, but those
bytes where the upper bit is set will be interpreted as Big5 leadbytes.

To get an idea as to how these codepages hang together for Windows, have a
look at http://www.microsoft.com/globaldev/reference/wincp.asp

Brendan



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT