Re: How to use Unicode on XML/HTML pages

From: Tony Graham (tgraham@mulberrytech.com)
Date: Mon Oct 25 1999 - 13:01:23 EDT


At 22 Oct 1999 16:12 -0700, Magda Danish (Unicode) wrote:
> -----Original Message-----
> From: Okana [mailto:okana@okanasweb.net]
> Sent: Friday, October 22, 1999 1:36 PM
> To: info@unicode.org
> Subject: where do I find this info?
...
> I have a web site using ISO-8859-2 as my character set. I know all the
> proper codes to make the central european characters appear the way they
> should on a browser.
>
> Now I am trying to learn XML, which calls Unicode character sets. I
> think what i want is in UTF-8 , the characters are on the pages for
> Latine Extended A. The codes are all there, and I was glad to find
> them. But what I cannot see if how to code these on an HTML or XML page
> to make the characters appear properly.
>
> Where do I find this information? It is probably right there, and I
> don't know that's what it is. In a windows environment with
> pan-european support on the browser, for a capital L with a slash all I
> needed to type was alt-0163. I tried putting in #&U+0141, to no avail.
> Help?

Try "Ł". If you're using Netscape Navigator, you might need to
use a decimal character reference: Ł.

The XML Recommendation and the HTML 4.0 Recommendation (see
http://www.w3c.org) both describe how to make numeric character
references to Unicode characters.

Decimal character references have the form &#nnnn;, where 'nnnn' is
the decimal representation of a character's Unicode Scalar Value (see
section 3.7 of the Unicode Standard, Version 2.0). For all the
currently defined characters, a character's Unicode Scalar Value is
its code value.

Hexadecimal character references have the form &#xhhhh;, where 'hhhh'
is the hexadecimal representation of a character's Unicode Scalar
Value.

Numeric character references can include leading zeroes but don't need
to, so 8, 8, 8, and 8 are all valid numeric
references to U+0038, AMPERSAND.

If you are using XML, you can use XML tools or you can use Internet
Explorer 5, which is currently the only HTML browser that also
supports XML.

You can, however, continue to use ISO-8859-2 with your XML, provided
that your XML parser supports it and that you indicate the encoding in
your XML declaration at the beginning of the XML file:

------------------------------------------------------------
<?xml encoding="ISO-8859-2"?>
<!-- Your XML goes here -->
------------------------------------------------------------

You can still make numeric character references to any Unicode
character, whether or not it's in ISO 8859-2.

If you are using HTML, you can use any browser, but not all browsers
support hexadecimal character references. If you are using UTF-8 with
HTML, then you should include a <meta> element indicating the
character encoding (unless your web server can supply the Content-Type
header):

------------------------------------------------------------
<html>
<head>
<title><!-- Your Title --></title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body bgcolor="#FFFFFF">
<!-- The body of the page goes here -->
</body>
</html>
------------------------------------------------------------

Regards,

Tony Graham
======================================================================
Tony Graham mailto:tgraham@mulberrytech.com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9632
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT