UTC/1999-021 Liaison report from W3C Subject: W3C XML CG statement on annotation characters ------------- Hello Lisa, hello Arnold, I have received the following statement on annotation characters from the Chair of the W3C XML Coordination Group, and I am forwarding it here in my role as liaison from W3C to the Unicode Consortium. I hope you can bring this to the due attention of the UTC. Regards, Martin. ------------ The XML Coordination Group of the World Wide Web Consortium wishes to express its concern regarding the inclusion of annotation characters in Unicode and to urge that the UTC take a position opposing their adoption. BACKGROUND: THE XML CG The W3C XML Coordination Group continues the work of the W3C XML Working Group (1996-1998), which designed XML, the Extensible Markup Language. The XML CG includes the chairs of the five current W3C XML Working Groups (Schema, Linking, Syntax, Infoset, and Fragment) and coordinates this work with other W3C working groups, most importantly the Document Object Model (DOM), Extensible Stylesheet Language (XSL), and Internationalization (I18N) Working Groups. XML CG POSITION REGARDING ANNOTATION CHARACTERS IN UNICODE The XML CG is deeply concerned about the inclusion of these three annotation characters in the Unicode standard: U+FFF9 Interlinear Annotation Anchor U+FFFA Interlinear Annotation Separator U+FFFB Interlinear Annotation Terminator It is our view that the functionality sought for in adding these characters should be provided using markup, not characters, and that if there are particular implementation needs that have caused this proposal to be put forward, architectural mechanisms should be identified that would satisfy them without imposing a stateful mechanism like ISO 2022 onto the World Wide Web. Our reasons for taking this position are as follows: 1. The annotation characters appear to contravene a basic principle of Unicode design. According to Section 3.11 of the Unicode 2.0 specification, In general, Unicode does not supply formatting codes; formatting is left up to higher-level protocols. The only exception to this rule is in the case of bidirectional behavior, and then only because "there are circumstances where an implicit bidirectional ordering is not sufficient to produce comprehensible text." No such need forces a violation of the basic design principle in this case. The annotation characters solve no problems that are beyond the reach of higher-level protocols, but their use can impair the integrity of those protocols. 2. The occurrence of annotation characters in XML content will create rendering problems by overlaying a concurrent and conflicting set of formatting controls on top of the controls driven by XML tagging. In particular, the annotation characters violate a basic assumption of XSL, which is that style boundaries coincide with element boundaries in the source document. The occurrence of annotation characters in XML content will directly conflict with stylesheets applied to documents containing that content. 3. The name "annotation characters" suggests that these are intended to deal with interlinear annotations in general, but the mechanism does not scale well to nested interlinear material (interlinear notes on interlinear notes) or to multiple levels of interlinear annotation on the same base. Nesting and multiple annotations on the same base text are well known in both Western and Eastern writing traditions; both are readily handled by higher-level protocols such as XML. 4. Similarly, the recursive embedding of such characters in MathML data could cause severe problems for rendering engines. 5. The adoption of annotation characters would set a precedent with harmful consequences for the users of XML, opening the door for the addition of Unicode characters to shift in and out of bold, italic, and other typographic modes. From an architectural standpoint, there is no difference between shifting into annotation mode and shifting into 36 point type. Such shifting is opaque to XML tools and, as is well known, destroys the ability to repurpose documents that use it. SUMMARY The XML Coordination Group views the inclusion of annotation characters in Unicode as a basic layering violation with grave consequences for the higher-level protocols that will be the most important users of Unicode. We urge the UTC to reject their adoption. Jon Bosak Chair, W3C XML Coordination Group #-#-# Martin J. Du"rst, World Wide Web Consortium #-#-# mailto:duerst@w3.org http://www.w3.org