Re: How to use Unicode on XML/HTML pages

From: Otto Stolz (
Date: Wed Oct 27 1999 - 18:43:48 EDT

Am 1999-10-22 um 1:36 h hat Denice Szafran Liscomb <>
> how to code [characters from the Latin Extended A range] on an HTML or XML
> page to make the characters appear properly.

I can only give advice for HTML. I have sent most of this to the Unicode
List, back in July.

> Where do I find this information?

Cf. <>, in particular
<>: according
to the HTML 4.0 definition, you may choose the most convenient code
page (which is perceived as a transport vehicle only) for your HTML
page and use entities, such as "&euro;", "&#8364;", or "&x20AC",
for those characters that are not in the code page chosen for the
transfer. In practice, however, this does work well *only* when you
choose UTF-8 as your transfer encoding; then you won't need to resort
to numerical character references, of course (but you are free to use
them if they convene to you). Examples of UTF-8 based pages:
and the attached file.

This scheme is defined only for HTML 4.0, so you will need to mark your
document as a HTML 4.0 document, cf.
(The example page from Reuters does, however, not comprise this
mandantory declaration.)

You cannot legally include Latin-2 characters in pre-4.0 HTML, as HTML 3.2
mandates Latin-1. In HTML 4, you must specify any encoding other than
Latin-1; so your Latin-2, or UTF-8, encoded HTML pages must either be
sent with an appropriate HTTP header field, or they must contain a Meta
tag, as discussed above.

I also recommend to tag the various parts of your HTML page with their
respective languages,
cf. <>, in particular
<> combined
with <>,
<> and

For more info on HTML i18n, cf. <>.

You may also wish to read other parts of the HTML 4.0 specification,
and hints for HTML authors:
and to test your HTML source against pertinent validation services:

Best wishes,
   Otto Stolz

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT