Re: How to use Unicode on XML/HTML pages

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Wed Oct 27 1999 - 18:43:48 EDT


Am 1999-10-22 um 1:36 h hat Denice Szafran Liscomb <okana@okanasweb.net>
geschrieben:
> how to code [characters from the Latin Extended A range] on an HTML or XML
> page to make the characters appear properly.

I can only give advice for HTML. I have sent most of this to the Unicode
List, back in July.

> Where do I find this information?

Cf. <http://www.w3.org/TR/REC-html40/charset.html>, in particular
<http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2>: according
to the HTML 4.0 definition, you may choose the most convenient code
page (which is perceived as a transport vehicle only) for your HTML
page and use entities, such as "&euro;", "&#8364;", or "&x20AC",
for those characters that are not in the code page chosen for the
transfer. In practice, however, this does work well *only* when you
choose UTF-8 as your transfer encoding; then you won't need to resort
to numerical character references, of course (but you are free to use
them if they convene to you). Examples of UTF-8 based pages:
<http://www.reuters.com/unicode/iuc10/x-utf8.html>,
and the attached file.

This scheme is defined only for HTML 4.0, so you will need to mark your
document as a HTML 4.0 document, cf.
<http://www.w3.org/TR/REC-html40/struct/global.html#h-7.2>.
(The example page from Reuters does, however, not comprise this
mandantory declaration.)

You cannot legally include Latin-2 characters in pre-4.0 HTML, as HTML 3.2
mandates Latin-1. In HTML 4, you must specify any encoding other than
Latin-1; so your Latin-2, or UTF-8, encoded HTML pages must either be
sent with an appropriate HTTP header field, or they must contain a Meta
tag, as discussed above.

I also recommend to tag the various parts of your HTML page with their
respective languages,
cf. <http://www.w3.org/TR/REC-html40/struct/dirlang.html>, in particular
<http://www.w3.org/TR/REC-html40/struct/dirlang.html#h-8.1.1> combined
with <http://sunsite.auc.dk/RFC/rfc/rfc1766.html>,
<http://userpage.chemie.fu-berlin.de/diverse/doc/ISO_639.html> and
<http://userpage.chemie.fu-berlin.de/diverse/doc/ISO_3166.html>.

For more info on HTML i18n, cf. <http://www.w3.org/International/>.

You may also wish to read other parts of the HTML 4.0 specification,
and hints for HTML authors:
  <http://www.w3.org/TR/REC-html40/>
  <http://www.w3.org/MarkUp/#guidelines>
  <http://www.w3.org/WAI/GL/#Current_Draft>
and to test your HTML source against pertinent validation services:
  <http://validator.w3.org/>
  <http://www.cast.org/bobby/>

Best wishes,
   Otto Stolz





This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT