L2/01-211 From: Rick McGowan [rick@unicode.org] Sent: Friday, May 18, 2001 1:18 PM An Informational RFC about Unicode Consortium procedures, policies, standard versioning, stability, and public access. 1. About The Unicode Consortium The Unicode Consortium is a corporation. Legally speaking it is a "California Nonprofit Mutual Benefit Corporation", organized under section 501 C(6) of the Internal Revenue Service Code. [see http://www.irs.ustreas.gov/prod/bus_info/eo/bus-orgs.html]. As such, it is a "business league" not focussed on profiting by sales or production of goods and services, but neither is it formally a "charitable" organization. It is an alliance of member companies whose purpose is to "extend, maintain, and promote the Unicode Standard". To this end, the consortium keeps a small office, a few editorial and technical staff, World Wide Web presence, and mail-list presence. The corporation is presided over by a Board of Directors who meet annually. The board appoints Officers of the corporation to run the daily operations. Membership in the consortium is open to "all corporations, other business entities, governmental agencies, not-for-profit organizations and academic institutions" who support the consortium's purpose. Formally, one class of voting membership is recognized, and dues-paying members are typically for-profit corporations, research and educational institutions, or national governments. Each such full member sends representatives to meetings of the Unicode Technical Committee (see below), as well as to a brief annual Membership meeting. 2. The Unicode Technical Committee The Unicode Technical Committee (UTC) is the technical decision making body of the consortium. The UTC inherited the work and prior decisions of the Unicode Working Group (UWG) that was active prior to formation of the consortium. Formally, the UTC is a technical body instituted by resolution of the board of directors. Each member appoints one principal and one or two alternate representatives to the UTC. UTC representatives frequently do, but need not, act as the ordinary member representatives for the purposes of the annual meeting. The UTC is presided over by a Chair and Vice-Chair, appointed by the Board of Directors for an unspecified term of service. The UTC meets 4 to 5 times a year to discuss proposals, additions, and various other technical topics. There is no fee for participation in the UTC meetings. Meeting agendas are not generally posted to any public forum, but meeting dates, locations, and logistics are posted well in advance at: http://www.unicode.org/unicode/timesens/calendar.html At the discretion of the UTC chair, meetings are open to participation of member and liaison organizations, and to observation by others. The minutes of meetings are posted publicly on the Unicode Web site, http://www.unicode.org. Meetings of the UTC are held frequently in the San Francisco Bay Area and occasionally on the East Coast of North America, where the majority of full members have their offices. Meetings typically last 3 to 4 full days. Rarely, a portion of a meeting will be declared a "closed caucus" for member representatives. Most UTC meetings are held jointly with NCITS Technical Committee L2, the body responsible for Character Code standards in the United States. They constitute "ad hoc" meetings of the L2 body. 3. Unicode Technical Committee Procedures The formal procedures of the UTC are publicly available in a document entitled "UTC Procedures" available from the Consortium, and on the website: http://www.unicode.org/unicode/consortium/utc-procedures.html Despite the invocation of Robert's Rules of Order, UTC meetings are conducted with relative informality in view of the highly technical nature of most discussions. Meetings focus on items from a technical agenda organized and published by the UTC Chair prior to the meeting. Technical items are usually proposals in one of the following categories: 1. Addition of new scripts 2. Addition of new characters or small batches of characters 3. Changes to architecture or semantics of characters 4. Preparation and Editing of Technical Reports and Standards Typical outputs of the UTC are: 1. The Unicode Standard, major and minor versions 2. Unicode Technical Reports 3. Stand-alone Unicode Technical Standards 4. Formal resolutions 5. Liaison statements and instructions to the Unicode liaisons to other organizations. For each technical item on the meeting agenda, there is a general process as follows: 1. Introduction by the topic sponsor 2. Proposals and discussion 3. Consensus statements or formal motions 4. Unicode Technical Committee Motions Technical topics of any complexity never proceed from initial proposal to final ratification or adoption into the standard in the course of one UTC meeting. The UTC members and presiding officers are aware that technical changes to the standard have broad consequences to other standards, implementors, and end-users of the standard. Input from other organizations and experts is often vital to the understanding of various proposals and for successful adoption into the standard. Technical topics are decided in UTC through the use of formal motions, either taken in meetings, or by means of 30-day letter ballots. Formal UTC motions are of two types: 1. Simple motions 2. Precedents Simple motions may pass with a simple majority constituting more than 50% of the qualified voting members; or by a special majority constituting 2/3 or more of the qualified voting members. Precedents are defined, according to the UTC Procedures as either (A) an existing Unicode Policy, or (B) an explicit precedent. Prececents must be passed or overturned by a special majority. Examples of implicit precedents include: 1. Publication of a character in the standard 2. Published normative character properties 3. Algorithms required for formal conformance An Explicit Precedent is a policy, procedure, encoding, algorithm, or other item that is established by a separate motion saying (in effect) that a particular prior motion establishes a precedent. 5. Unicode Consortium Policies Because the Unicode Standard is continually evolving to approach the ideal of encoding "all the world's scripts", new characters will constantly be added. In this sense, the standard is unstable: in the standard's useful lifetime, there may never be a final point at which no more characters are added. Realizing this, the Consortium has adopted certain policies to promote and maintain stability of the characters that are already encoded, as well as laying out a Roadmap to future encodings. The overall policies of the Consortium with regard to encoding stability are published on the web at this URL: http://www.unicode.org/unicode/standard/policies.html Deliberations and encoding proposals in the UTC are bound by these policies. The general effect of the policies may be stated in this way: once a character is encoded, it will not be moved or removed and its name will not be changed. Any of those actions has the potential for causing obsolescence of data, and they are not permitted. The canonical combining class and decompositions of characters will not be changed in any way that affects normalization. In this sense normalization, such as that used for International Domain Naming and "early normalization" for use on the World Wide Web, is fixed and stable for every character at the time that character is encoded. Property values of characters, such as directionality for the Unicode Bidi algorithm, may be changed in some circumstances. As less-well documented characters and scripts are encoded, the exact character properties and behavior may not be well known at the time the characters are first encoded. As more experience is gathered in implementing the newly encoded characters, adjustments in the properties may become necessary. This re-working is kept to a minimum. New and old versions of the relevant property tables are made available on the Consortium's web site. Normative and some informative data about characters is kept in the Unicode Character Database. The structure of many of these property values will not be changed. Instead, the Consortium adds new files for new properties, so as not to affect the stability of existing implementations that use these values. 6. The Unicode Technical Committee and ISO The character repertoire, names, and general architecture of the Unicode Standard are identical to the parallel international standard ISO/IEC 10646. Unicode provides additional properties and implementation information that ISO/IEC 10646 does not. Implementations conformant to Unicode are conformant to ISO/IEC 10646. ISO/IEC 10646 is maintained by the committee ISO/IEC JTC1/SC2/WG2, which maintains a web presence at: http://anubis.dkuug.dk/jtc1/sc2/wg2/. The WG2 committee is composed of national body representatives to ISO. Details of ISO organization may be found at http://www.iso.ch. Details and history of the relationship between ISO/IEC JTC1/SC2/WG2 and Unicode, Inc. may be found in Appendix C of The Unicode Standard. WG2 shares with UTC the policies regarding stability: WG2 neither removes characters nor changes their names once published. Changes in both standards are closely tracked by the respective committees, and a very close working relationship is fostered to maintain synchronization between the standards. 7. Process of Technical Changes to the Unicode Standard Changes to The Unicode Standard are of two types: architectural changes, and character additions. Some architectural changes may not affect ISO/IEC 10646, for example, the addition of some informative properties to Unicode. Those architectural changes that do affect both standards, such as additional UTF formats or allocation of planes, are very carefully coordinated by the committees. As always, on the UTC side, architectural changes that establish precedents are carefully monitored and the above-described rules and procedures are followed. Additional characters for inclusion in the The Unicode Standard must be approved both by the UTC and by WG2. Proposals for additional characters enter the standards process in one of several ways: through... 1. a national body member of WG2 2. a member company or associate of UTC 3. directly from an individual "expert" contributor The two committees have jointly produced a "Proposal Summary Form" that is required to accompany all additional character proposals. It may be found online at: http://www.dkuug.dk/JTC1/SC2/WG2/docs/form1.html Instructions for submitting proposals to UTC may be found online at: http://www.unicode.org/pending/proposals.html Often, submission of proposals to both committees (UTC and WG2) is simultaneous. Members of UTC also frequently forward to WG2 proposals that have been initially reviewed by UTC. In general, a proposal that is submitted to UTC before being submitted to WG2 passes through several stages: 1. Initial presentation to UTC 2. Review and re-drafting 3. Forwarding to WG2 for consideration 4. Re-drafting for technical changes 5. Balloting for approval in UTC 6. Re-forwading and recommendation to WG2 7. Two rounds of international balloting in ISO About two years are required to complete this process. Proposals of any type are almost never directly approved by UTC on first viewing, but are usually sent back to the submitters. Repertoire addition proposals that are submitted to WG2 before Unicode are generally forwarded immediately to UTC through committee liaisons. The crucial parts of the process (steps 5 through 7 above) are never short-circuited. Two-thirds majority in UTC is required for approval at step 5. Proposals for additional scripts are required to be coordinated with relevant user communities. Often there are ad-hoc subcommittees of UTC or expert mail list participants who are responsible for actually drafting proposals, garnering community support, or representing user communities. The two rounds of international balloting (steps 7) have participation both by UTC and WG2, though UTC does not directly vote in the ISO process. Occasionally a proposal approved by one body is considered too immature for approval by the other body, and may be blocked de-facto by either of the two. Only after both bodies have approved the additional characters do they proceed to the rounds of international balloting. (The first round is a draft international standard during which some changes may occur, the second round is final approval during which only editorial changes are made.) This process assures that proposals for additional characters are mature and stable by the time they appear in a second international ballot. 8. Public Access to the Character Encoding Process While Unicode, Inc, is a membership organization, and the final say in technical matters rests with UTC, the process is quite open to public input and scrutiny of processes and proposals. There are many influential individual experts and industry groups who are not formally members, but whose input to the process is taken seriously by UTC. Internally, UTC maintains a mail list called the "Unicore" list (unicore@unicode.org), which carries traffic related to meetings, technical content of the standard, and so forth. Members of the list are UTC representatives; employees and staff of member organizations (such as the Research Libraries Group); individual liaisons to and from other standards bodies (such as WG2 and IETF); and invited experts from institutions such as the Library of Congress and some universities. Subscription to the list for external individuals is subject to "sponsorship" by the corporate officers. Unicode, Inc. also maintains a public discussion list called the "Unicode" list (unicode@unicode.org). Subscription is open to anyone, and proceedings of the "Unicode" mail list are made public via FTP on an occasional basis. Details are located at: http://www.unicode.org/unicode/consortium/distlist.html All technical proposals for changes to the standard are posted to both of these mail lists on a regular basis. Discussion on the public list is also monitored by many members of UTC, and frequently the results of these public discussions are brought up later at UTC meetings. All technical issues and other standardization "events" of any significance, such as beta releases and availablility of draft documents, are announced and then discussed in this public forum, well before standardization is finalized. Anyone may make a character encoding or architectural proposal to UTC. Membership in the organization is not required to submit a proposal. To be taken seriously, the proposal must be framed in a substantial way, and be accompanied by sufficient documentation to warrant discussion. Examples of proposals are easily available by following links from the "Proposed Characters" heading available at the Unicode web site. The main proposal page is at: http://www.unicode.org/unicode/alloc/Pipeline.html Guidelines are given at the following web location: http://www.unicode.org/pending/proposals.html In general, proposals are aired on the "Unicode" mail list, sometimes for a long period, prior to formal submission. Generally this is of benefit to the proposer as it tends to reduce the number of times the proposal is sent back for clarification or with requests for additional information. Once a proposal reaches the stage of being ready for discussion by UTC, the proposer will have received contact through the public mail list with one or more UTC members willing to explain or defend it in a UTC meeting. 7