The World Wide Web (WWW) is a collection of interoperating applications that exchange data using various protocols and formats. A large part of the data exchanged is text. In order for this text to be handled correctly independent of character encoding, format, protocol, or application, a clear understanding of character encoding and processing issues, i.e. a Character Model, is necessary.

The paper will discuss the various aspects of the character model. The base part of the character model deals with character encoding, including issues such as the distinction between bytes, characters, and glyphs, and recommendations on escaping techniques to include any Unicode character in any character encoding. This part is based on the model of RFC 2070 (Internationalization of HTML) and includes experience from HTML 4.0, XML 1.0, and CSS.

With the WWW changing more and more from a one-way content-delivery system to a very large integrated application, more areas of character handling seem to need clear specifications. This in particular applies to the handling of cannonical equivalences (precomposed vs. decomposed) and to character indexing.

As the character model is currently under development, and will be evolved as new needs arise, the actual presentation may be somewhat different from this summary.

