The World Wide Web (WWW) is a collection of interoperating applications that exchange data using various protocols and formats. A large part of the data exchanged is text. In order for this text to be handled correctly independent of character encoding, format, protocol, or application, a clear understanding of character encoding and processing issues, i.e. a Character Model, is necessary.

The paper will discuss the various aspects of the character model. The base part of the character model deals with character encoding, including issues such as the distinction between bytes, characters, and glyphs, and recommendations on escaping techniques to include any Unicode character in any character encoding. This part is based on the model of RFC 2070 (Internationalization of HTML) and includes experience from HTML 4.0, XML 1.0, and CSS.

With the WWW changing more and more from a one-way content-delivery system to a very large integrated application, more areas of character handling seem to need clear specifications. This in particular applies to the handling of cannonical equivalences (precomposed vs. decomposed) and to character indexing.

As the character model is currently under development, and will be evolved as new needs arise, the actual presentation may be somewhat different from this summary.

When the world wants to talk, it speaks Unicode
ProgramShowcasePast ConferencesRegistrationUnicode StandardCall for Papers
AccommodationSponsorsTalks and PapersTravelConference BoardNext Conference
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

24 January 1999, Webmaster