"Unicode in XML and other Markup Languages" published today

From: Misha Wolf (Misha.Wolf@reuters.com)
Date: Fri Dec 15 2000 - 15:07:47 EST


The following document was today jointly published by the World Wide Web
Consortium (W3C) and the Unicode Consortium:

   Unicode in XML and other Markup Languages

   W3C Note 15 December 2000
   http://www.w3.org/TR/unicode-xml/

   Unicode Technical Report #20
   http://www.unicode.org/unicode/reports/tr20/

The document's Introduction is reproduced below
-----------------------------------------------

The Unicode Standard [Unicode] defines the universal character set. Its
primary goal is to provide an unambiguous encoding of the content of
plain text, ultimately covering all languages in the world. Currently in
its third major version, Unicode contains a large number of characters
covering most of the currently used scripts in the world. It also
contains additional characters for interoperability with older character
encodings, and characters with control-like functions included primarily
for reasons of providing unambiguous interpretation of plain text.
Unicode provides specifications for use of all of these characters.

For document and data interchange, the Internet and the World Wide Web
are more and more making use of marked-up text such as HTML and XML. In
many instances, markup provides the same, or essentially similar
features to those provided by format characters in the Unicode Standard
for use in plain text. Another special character category provided by
Unicode are compatibility characters. While there may be valid reasons
to support these characters and their specifications in plain text,
their use in marked-up text can conflict with the rules of the markup
language. Formatting characters are discussed in chapters 2 and 3,
compatibility characters in chapter 4.

The issues of using Unicode characters with marked-up text depend to
some degree on the rules of the markup language in question and the set
of elements it contains. In a narrow sense, this document concerns
itself only with XML, and to some extent HTML. However, much of the
general information presented here should be useful in a broader
context, including some page layout languages.

Note: Many of the recommendations of this report depend on the
availability of particular markup. Where possible, appropriate DTDs or
Schemas should be used or designed to make such markup available, or the
DTDs or Schemas used should be appropriately extended. The current
version of this document makes no specific recommendations for the
design of DTD's or schemas, or for the use of particular DTDs or
Schemas, but the information presented here may be useful to designers
of DTDs and Schemas, and to people selecting DTDs or Schemas for their
applications. The recommendations of this report do not apply in the
case of XML used for blind data transport and similar cases.

Misha Wolf
W3C I18N WG chair
Unicode Technical Committee member

-----------------------------------------------------------------
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of the individual
sender, except where the sender specifically states them to be
the views of Reuters Ltd.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT