Re: UTF-8 in web pages

From: Chris Wendt (
Date: Fri Feb 05 1999 - 17:33:50 EST

>For Far East country, Japan, Korean, PRC, Taiwan, UTF-8 is 1.5 times bigger
>than their local encoding.

True, but not really relevant anymore because pages today contain _way_ more
markup and images than actual content :-(
Or maybe a picture says more than a 1000 words....

>Also, please remember Unicode unifies Hanji characters,

You can influence this by tagging the content with a lang attribute, i.e.
<HTML lang=zh-cn>. In this example the unified Han will be preferably
rendered with a simplified Chinese font.

>>but sometimes it will be displayed in Chinese font

Again this will not happen if you apply the language attributes for Han

The strength of UTF-8 is visible in multilingual sites. The (now defunct)
Internet Explorer Channel Guide was completely authored in UTF-8 because it
allowed to preview channels of any language within a framework of pages in
any other language. We did not want to use frames but instead display the
channel description fed by a UTF-8 only database into any other language's

UTF-8 is your only option for languages without widely established multibyte
encodings. Just pick any Unicode range between Arabic and Hangul and try to
find a widely supported multibyte encoding that covers these scripts.

-----Original Message-----
From: Yoshifumi Inoue <>
To: Unicode List <>
Date: Friday, February 05, 1999 1:18 PM
Subject: RE: UTF-8 in web pages

>In Netscape Navigator 4 Japanese version, the default font for UTF-8
>encoding is "Times New Roman" in Windows platform.
>If you don't specify the font for text, e.g. FONT tag or CSS, you can not
>see UTF-8 encoded Japanese characters.
>For Far East country, Japan, Korean, PRC, Taiwan, UTF-8 is 1.5 times bigger
>than their local encoding.
>Also, please remember Unicode unifies Hanji characters, if you see the page
>contains Japanese Kanji, Chinese Hanji, the page will be curious. Since,
>Japanese expect Japanese Kanji in the text, but sometimes it will be
>displayed in Chinese font unless you explicitly specifies font for each
>UTF-8 is just exchange character information. It does not provide us script
>information, e.g. rendering fonts.
>- yosi
> -----Original Message-----
>From: []
>Sent: Friday, February 05, 1999 10:47 AM
>To: Unicode List
>Subject: Re: UTF-8 in web pages
>current versions of internet explorer, netscape, and lynx all support
>unicode encodings.
>unicode is _the_ html character set since version 3.2, i.e., all unicode
>characters are supported by html. for example, (hexa)decimal numbers in
>character entities are resolved as unicode code points.
>the default charset is still iso 8859-1 - which is a subset of unicode,
>i guess you know
> <meta http-equiv="Content-Type" Content="text/html; charset=utf-8">
>the xml standard requires that clients are able to handle utf-8 and utf-16.
>best regards,
>Markus Scherer IBM RTP +1 919 486 1135 Dept. Fax +1 919 254 6430
> Unicode is here! -->
>"John O'Conner" <> on 99-02-05 12:15:33
>To: Unicode List <>
>Subject: UTF-8 in web pages
>I have a client that has a requirement to support several
>languages on their website and e-commerce store. I want to
>help them manage the storage of information and dynamic web
>pages by suggesting a common character set for all
>It seems like a no-brainer to select Unicode for my database
>character set because of their multi-language needs.
>However, I'm concerned about Unicode in web pages. I have
>browsed several UTF-8 pages with success, but I notice that
>the industry hasn't really picked up on UTF-8 as an HTML
>content encoding. Do any of you have any success/failure
>stories that you can share? How comfortable would you be
>recommending UTF-8 for HTML content. Oh, here's one more
>piece of information...the customer has traditionally used
>Big 5 for all their encoding needs. Actually...they've used
>an extension for their special chars in Hong Kong that don't
>seem to be available in Big 5.
>John O'Conner

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT