L2/00-118 Agenda topic for next UTC: Parts of the Standard Two contributions, combined into one document: 1. Mark Davis 2. Ken Whistler 1. Mark Davis on 04/05/2000 Problem: Various other organizations are confused by the way we handle TRs. They don't realize that a given version of the Unicode Standard specifies TR versions. The name TR itself misunderstood by some people, since for other standards organizations (e.g. ISO) it is not understood as a normative document. [Of course, the W3C and some other organizations use the odd term "Recommendation", as in "Extensible Markup Language (XML) 1.0 W3C Recommendation". Still, I suppose that is better than "Extensible Markup Language (XML) 1.0 W3C Suggestion".] Proposal: 1. For those TRs that are considered part of the Unicode Standard, rename and reset the version number to reflect that. Use the phrase "The Unicode Standard, Version 3.0, Part N" since people are familiar with notation of "Part" from ISO. Reserve "Part 1" to refer to the current book, and "Part 2" to refer to the Unicode Character Database. Otherwise number them consecutively. Looking at http://www.unicode.org/unicode/standard/versions/enumeratedversions.html#Unicode 3.0.0 , this would have the following implications: UTR #15: Unicode Normalization Forms, Version 18.0 => TUS 3.0.0, Part 3: Unicode Normalization Forms [or => TUS Part 3: Unicode Normalization Forms, Version 3.0.0] UTR #14: Line Breaking Properties, Version 6.0 => TUS 3.0.0, Part 4: Line Breaking Properties UTR #11: East Asian Character Width, Version 5.0 => TUS 3.0.0, Part 5: East Asian Character Width UTR #13: Unicode Newline Guidelines, Version 5.0 => TUS 3.0.0, Part 6: Unicode Newline Guidelines 2. Modify the following to reflect the change in status of these parts http://www.unicode.org/unicode/reports/techreports.html http://www.unicode.org/unicode/standard/standard.html http://www.unicode.org/unicode/standard/versions http://www.unicode.org/unicode/uni2book/u2.html http://www.unicode.org/unicode/standard/versions/enumeratedversions.html 3. Move these documents into the standards directory on the website 4. In the old TR URL locations, put text explaining that the TRs have "graduated" to become parts of the Unicode Standard. 5. Lock the version numbers of the Parts to the main standard. That is, there will *only* be a new version of the text of any part when there is a new version of the standard. E.g. we would only change them when we introduce Unicode 3.0.1 (or whatever the next version of the UCD is). 6. There may be public draft versions of the upcoming documents. Those would be clearly cross-linked and marked, e.g. TUS Part 3: Unicode Normalization Forms: Draft Version Mark ============================================================================== 2. Kenneth Whistler on 04/05/2000 Mark, Is this open for discussion and suggestion now, before you submit this as a UTC document? Or is this email text your final proposal? I agree with your assessment of the problem regarding how other organizations perceive our Technical Reports. However, I am concerned about some of the details of your proposal for addressing that problem. While having official "Parts" for the Unicode Standard would certainly convey to ISO-oids that they are indeed normative parts of the standard, I think going down that route would be disastrously confusing, since the numbered "Parts" of the Unicode Standard would have different statuses and not be aligned with the numbered "Parts" of the 10646 standard that we are trying (successfully) to maintain synchrony with. And consider the track record with ISO Parts. Even among the cognoscenti, there is a tendency to trip over the distinction between the second edition of the first part of 10646 (10646-1:2000) and the first edition of the second part of 10646 (CD 10646-2). Everybody not in the know is confused. And consider the 8859 parts: Part 1 is Latin 1, Part 2 is Latin 2, but Part 9 is Latin 5, Part 15 is Latin 9 nicknamed Latin 0. Everybody outside WG3 also is starting to get those mixed up--not even counting the differences which are starting to mount between the different editions of those parts. If we start to have a Unicode 3.0.0 Part 3, and so on, this will just add to the confusion. So to summarize to this point, I definitely don't think we should borrow the "Part" terminology and apply it nonconformably to the Unicode Standard simply to address the perception issue about our Technical Reports. Now let's consider what the current situation is. We have a series of different "things" that the UTC is maintaining -- some of them normative and some of them informative. These come in several categories. A. The printed Unicode Standard book. This contains a huge amount of normative material, but also a huge amount of informative material. The publication of the printed book is definitive of major versions of the standard. B. The Unicode Character Database. This consists of multiple data files now, many of which contain normative material, but some of which are informative only. Updating the UCD data files is definitive of minor and update versions of the standard. C. Normative Technical Reports that are part of "the standard". These are approved UTR's that are officially part of the versioned standard. Currently we have: UTR #11 East Asian Character Width, version 5.0 UTR #13 Unicode Newline Guidelines, version 5.0 UTR #14 Line Breaking Properties, version 6.0 UTR #15 Unicode Normalization Forms, version 18.0 D. Normative Technical Reports that are not part of "the standard". These constitute little independent standards of their own. Currently we have: UTR #6 A Standard Compression Scheme for Unicode, version 3.1 UTR #10 Unicode Collation Algorithm, version 5.0 UTR #16 UTF-EBCDIC could also fit into this category, but currently has no conformance clause in it. E. Informative Technical Reports Other technical reports, including odd ones like UTR #7 that contain text that reads like a standard intentionally, but which will be supplanted by the formal encoding of Plane 14 characters when CD 10646-2 progresses to IS. The perception problem is mainly for category C. (And currently most importantly for normalization, since that is the issue that other standards are concerned about for current referencing.) Before we start renaming and renumbering everything, I think we should first figure out what we are doing. My first suggestion would be to do the rectification of names for categories C, D, and E above (assuming that everyone is already used to what we call categories A and B). And we should clearly distinguish between the category (or type) of these things, and their approval status. Part of the problem we have now is that the "status" of a Technical Report (see www.unicode.org/unicode/reports/techreports.html) is now overloaded and conflated to convey both type *and* approval status. That should end. My suggestions for a starting point follow. Category: A. The Book (currently one volume, but may end up multi-volume) B. The Unicode Character Database (currently many files, and will be more) C. Unicode Standard Supplements These are the "graduated" UTR's synchronized with versions of "the standard", and effectively stand as separately published normative annexes to the standard. (We could even call them "Annexes", to drive the point home, I suppose.) D. Unicode Technical Standard (normative) E. Unicode Technical Report (informative only) Categories A, B, and C, considered in toto, constitute a numbered version of "The Unicode Standard", and are specified and tracked in detail in the Enumerated Versions portion of the website. Categories D and E are distinct from "The Unicode Standard", though clearly related to it in some way; their versioning is independently maintained. Approval Status for UTR's, UTS's, and USS's: 1. No status (working document for discussion) 2. Proposed Draft (formatted and posted for internal review) 3. Draft (posted for public review) 4. Approved (technical content formally approved by UTC) 5. Standardized (a further approval required for a UTS or a USS, or to change the category of an approved UTR to a UTS or a USS) 6. Superseded I would like to discuss this general framework and come to some kind of consensus as to how we should proceed here and as to how we should publicize the framework to educate ourselves and others about our intended process *before* we start considering the detail work of whether and how we start renaming or renumbering already approved UTR's and reorganizing the techreports part of the website. Furthermore, as a part of this, I would suggest that we spell out *specifically* and exactly what portion of "stuff" we maintain is intended to be maintained in synchronization with 10646 (and 14651, for that matter), so that those outside UTC can have reasonable guidelines as to what to consider when viewing our standard from the viewpoint of the ISO standards. --Ken 4