Charset declaration in HTML (was: Romanized Singhala - Think about it again) from Otto Stolz on 2012-07-04 (Unicode Mail List Archive)

From: Otto Stolz <Otto.Stolz_at_uni-konstanz.de>
Date: Wed, 04 Jul 2012 21:25:45 +0200

Hello Naena Guru,

on 2012-07-04, you wrote:
> The purpose of
> declaring the character set as iso-8859-1 than utf-8 is to avoid doubling
> and trebling the size of the page by utf-8. I think, if you have characters
> outside iso-8859-1 and declare the page as such, you get
> Character-not-found for those locations. (I may be wrong).

You are wrong, indeed.

If you declare your page as ISO-8859-1, every octet
(aka byte) in your page will be understood as a Latin-1
character; hence you cannot have any other character
in your page. So, your notion of “characters outside
iso-8859-1” is completely meaningless.

If you declare your page as UTF-8, you can have
any Unicode character (even PUA characters) in
your page.

Regardless of the charset declaration of your page,
you can include both Numeric Character References
and Character Entity References in your HTML source,
cf., e.g., <http://www.w3.org/TR/html401/charset.html#h-5.3>.
These may refer to any Unicode character, whatsoever.
However, they will take considerably more storage space
(and transmission bandwidth) than the UTF-8 encoded
characters would take.

Good luck,
Otto Stolz
Received on Wed Jul 04 2012 - 14:28:21 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 04 2012 - 14:28:21 CDT