Nineteenth International Unicode Conference

Internationalized Resource Identifiers: From Specification to Testing

Martin Dürst - W3C/Keio University

Intended Audience:	Manager, Software Engineer, Systems Analyst
Session Level:	Intermediate

Uniform Resource Identifiers (URIs) are one of the bases of the World Wide Web. But they are limited to a subset of ASCII characters.

To extend the character repertoire, a new type of identifier is defined, called Internationalized Resource Identifiers (IRI). By design, all URIs are IRIs, and all IRIs can be converted to URIs via UTF-8 and the URI %-escapes. IRIs therefore keep all the interesting properties of URIs. Also, with some care, URIs can be converted back to IRIs if they were originally IRIs.

The paper will discuss progress on specification both in the IETF as well as at the W3C (in the Character Model), and issues for implementers and users.

At the W3C, this backwards-compatible approach has already been pursued for some time, and has been realized in the specifications for (X)HTML, XML, XML Schema, XLink, RDF, and so on. The recent introduction of the term IRI has made it easier to describe the approach.

Particular emphasis will be given to examples, which are important to help understand the various layers of the conceptual encoding model for URIs and IRIs, because in many simple cases, most of the layers cannot be observed explicitly. Many of the examples will also be available as tests.

When the world wants to talk, it speaks Unicode

International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

22 Jun 2001, Webmaster