Preliminary Minutes - UTC #71 & X3L2 #168 ad hoc meeting

San Diego - December 5-6, 1996

December 18, 1996

1.                Administrative Issues

1.1.           UTC membership roll call

1.1.1.      Call for proxies

NeXT is represented by Ken Whistler

1.1.2.      Roll call - 15 present, 2 not present, 1 present by proxy

Present: Apple, Digital, HP, IBM, Justsystem, Microsoft, NCR, NeXT (proxy), Novell, Oracle, Reuters, RLG, SGI, Spyglass, Sybase, Unisys

Not present: Gamma Productions, MGI

Quorum is 9, we have quorum


1.1.3.      List of participants:







Jenkins, John






Rannenberg, Wendy







Gamma Production





Carroll, Don






Ksar, Mike






Umamaheswaran, V.S.






Kobayashi, Tatsuo






Kondo, Hiroaki






Gotoda, Koji






Batutis, Ed

IBM - Lotus











Suignard, Michel






Sargent, Murray






Roberts, Gary






proxy - Ken Whistler






Honomichl, Lloyd






Kung, Michael






Texin, Tex

Progress Software





Wolf, Misha






Aliprand, Joan






Hart, Edwin






Mariani, Gianni

Silicon Graphics





Adams. Glenn






Hiura, Hidek

Sun Microsystems





Whistler, Ken






Freytag, Asmus






Winkler, Arnold






Gilbert, Judith

vivid studios






Legend:           P - primary, A - alternate, O - observer, L - liaison, X - ex-officio
                        x - present, CM - corporate member, AM - associate member, VP - Vice-
                        President, Unicode Consortium



1.2.           Declaration of joint meeting

1.3.           Registration of new documents


Report - IAB character set workshop

Chris Weider




Apostrophe clarification

Mark Davis




Romanian request

Michel Suignard




Proposal for a standard compression schema for Unicode

Wolf, Whistler, Wicksteed, Davis




BMP and supplementary planes allocation roadmap

Moore, McGowan, Becker, Whistler




Supplement Arabic with Uighur, Kazakh and Kirghiz

China (Mao)




About the function of identifiers of Mongolian Proposal

China (Mao)




Concerns about UFT-8 conversion algorithm differences between Unicode and AMD-2

Ed Hart




“Pipeline” draft list, summary of proposals

Joe Becker




Final Agenda, X3L2 #168





Final Agenda, UTC and X3L2 ad hoc meeting





Responses to WG2 N2767, support for projects of WG2

WG2 N2770




Proposed draft amendment #10 to 10646

Glenn Adams




BIG5, User defined Chinese characters
HongKong procurement requirement





WG2 meeting - Action items

WG2, Uma




1.4.           Approval of joint meeting agenda

The agenda as amended was approved. Added items: 6.2.3, 6.2.4, 11.5, 11.6, 11.7

1.5.           Minutes and action items

1.5.1.      Approval of minutes UTC #70

Motion to approve the minutes, moved by Adams, seconded by Uma.

Motion approved: 11 for, 4 abstentions.


Action for Greenfield: Distribute list of attendees at UTC #70 to members (corporate representative and associates who attended that meeting).


Action for Greenfield: Correct spelling of Uma’s name in the minutes. Make sure that spell checker has correct spelling.


Action for Aliprand: Notify Greenfield of all action items for him from this meeting.


1.5.2.      Review of action items X3L2/SD2]

Reviewed, based on UTC 71 #38. The next action item list will combine X3L2 and UTC items in one document (X3L2/SD-2), with AIs in continuous chronological order. Each AI will be numbered as either a UTC AI or an X3L2 AI ; these numbers will be in separate columns.


Action for Adams: Contact Everson re WG2 procedures for Cherokee proposal.


Action for Adams re UTC#70-A32: Document for next UTC meeting (fonts for public use).


1.6.           Meeting calendar

UTC, UIC, X3L2, WG2, IRG, WG20 …

Include W3C Conference in Santa Clara, CA, April 7-11, 1997.


Action for Aliprand: Have Greenfield check and revise meeting calender and distribute to “unicore”.


UTC agreed to cancel March meeting. Winkler proposal: have 3 meetings only, all of them together with X3L2. This would eliminate the problems with incomplete information of people who cannot participate in UTC meetings.


Future dates for the three joint X3L2 & UTC meetings in 1997 are:

May 29-30, 1997.

Aug. 7-8. 1997.

Early December (exact date to be set after dates for Tokyo IUC are finalized).


John Jenkins offered Apple to host May meeting if Taligent cannot.


Action item for Aliprand: Check with Davis re Taligent as host for rescheduled meeting in May.


January 13 - 17, 1997:



January 20 - 24, 1997



March 2 - 7, 1997

SHARE meeting

San Francisco

March 10 - 12, 1997

IUC #10

Mainz, Germany

April 7 - 11, 1997

W3C conference

Santa Clara, CA

May 12 - 16, 1997



May 29 - 30, 1997

UTC #72 & X3L2 #169


June 23 - 27, 1997

SC2/WG2 and SC2


August 7 - 8, 1997

UTC #73 & X3L2 #170


September 3 - 5, 1997

IUC #11

San Jose

Sept. 29 - Oct. 3, 1997



November 1997



December 1997

IUC #12

Tokyo, Japan

December 11 - 12, 1997

UTC #74 & X3L2 #171

TBD, tentative


2.                Standing items

2.1.           Errata to Unicode 2.0

Errata should be reported to errata@unicode.org. Editorial corrections will be posted at the Web site without review by UTC. Errata affecting technical content will be reviewed by the UTC and posted if approved. Presently there is only a glyph error (J with hook).


Action for McGowan: Bring dump of errata to UTC #72 for review.


Action for Aliprand: Notify McGowan of all action items for him from this meeting.


Ksar expressed concern about errors on the Unicode Web site, specifically, section on allocation.

3.                Allocation of scripts

3.1.           Summary of allocation proposals - Moore, Becker, McGowan, Whistler [96-111, 96-115]

Whistler reported for Becker. Paper describes

1)     allocation and coding per se, i.e., the proposals that are accepted, rejected, or coming down the pipeline.

2)     Attempts an overall assessment of what should be encoded on the BMP and what might be encoded on supplementary planes.

Working group included Becker, Jenkins, Ksar, McGowan, Moore, Whistler.


Becker proposed making it a standing document that shows the progress for all proposals.


Suggested: BMP to be filled with selected scripts, based on the expertise of the authors. All contemporary scripts and extinct scripts with large collections of literature. Intention is to cover all living minority scripts on BMP. The proposal takes into account the proposal from Everson in 96-101. It leaves room for the vertical extensions from the IRG (target area was left empty).

Plane 1: all extinct non Han scripts of the world

Plane 2: all additional Han characters


Freytag argued for inclusion of ideographic components within the BMP, to allow representation of characters that have not been standardized. Adams gave examples from Chu Nom, but said we should not make a final judgment so soon.


The document is meant for UTC and should not go any further without some editing of personal opinions.


Motion, moved by Freytag, seconded by Jenkins:

UTC approves the high level policy decisions in guidelines document X3L2/96-111:

·         the 3 plane allocation as to number and nature (BMP, non Han, Han)

·         placement of the right-to-left scripts between Arabic and Devanagari.

·         the proposed placement of Yi script between ideographs and hangul.

Motion approved: 15 for, 1 abstention.


UTC should define the progression of the document.

Uma: just guideline document, not cast in concrete. Eliminate differences with Everson, if possible before submitting

Justsystem abstains as there is not enough room for kanji in Plane 3. A Chinese dictionary has over 80.000+ characters, so kanji need more space than just one plane. Jenkins said that any additional ideographic characters can spill over into another plane. Kobayashi said that Han characters need to be organized within their assigned plane.


Uma said that the document should highlight possible use of other plans. Ksar pointed out that WG2 does not reserve any planes for anything, so planes cannot be reserved for a specific use.


Motion, moved by Freytag, seconded by Jenkins:

The UTC accepts the technical content of X3L2/96-111 omitting the comparison section with the Everson proposal and the immediately preceding sentence starting with “X These …” and will use this document as a starting point leading to a document that meets the action item to develop a NP like scope for the 10646-x project in WG2.

Motion approved: 15 for, 1 abstention.


Edit document based on discussions in the meeting - number of additional Han characters seems to be a bone of contention. Leaving “holes” for efficient allocation of Kanji needs significant study.


Action for Whistler: Write first cut edited document within 2 weeks.


The document to be developed will take into account the e-mail from Sato with the guidelines.


Action for Adams and Suignard: In conjunction with Japanese NSB, draft work item scope statement on WG2 request for meta-level architecture of ISO 10646.. Targeted date: WG2 meeting in Singapore. Uma suggested that relevant parts for the work item should be extracted from document X3L2/96-111.


3.2.           Allocating Ogham and Runes to the BMP: a strategy for making the BMP maximally useful - Everson [96-101]

Covered by discussions and decisions in 3.1.

4.                Specific scripts

4.1.           Deseret alphabet

4.1.1.      Proposal for encoding the Deseret Alphabet in ISO 10646 - Jenkins [96-104]

Should go out of the BMP. Jenkins developed this proposal to allow encoding of a script on a supplementary plane to encourage implementation of UTF-16. Needs 4 columns


Motion, moved by Sargent, seconded by Mariani:

The UTC accepts the Deseret script repertoire, and recommends that it be encoded off the BMP.

Motion passed: 14 for, 1 abstention, 1 absent from room.


Action for Jenkins: Forward proposal to WG2 as a contribution from the Unicode Consortium. Give the proposal to Ksar for WG2 mailing.


4.2.           Yi script

4.2.1.      WG2 N1187 Encoding the Yi script - Everson [96-098]

4.2.2.      WG2 N1415 Proposal for encoding Yi script on BMP - China [96-099]

Whistler has studied both proposals, Everson proposes a combining tone mark. Everson also proposes spelling of the syllables in pronounceable form. Ken suggests that the UTC support the Everson proposal.

Adams: Other syllables of Yi have tome inherent; an explicit, separate tone mark would be variant treatment. China is researching a revision to their proposal. Mao is making a trip and will come back with his findings, but the Chinese probably like their current one. Names are less likely to be am issue than the tone mark.

Sargent: Implementation question is important. Combining marks are being worked on, technology will be available some time soon.

Jenkins: preference for combining way, but no big fight.

Asmus: don’t let us be boxed in on combining method.


Motion, moved by Freytag, seconded by Jenkins:

Unicode representatives at the WG2 meeting are instructed to push for principle of use of combining marks, and in all other respects support Everson’s analysis, that is:

1)     use of combining marks for tone mark,

2)     naming of syllables more closely corresponding to Lolo phonetics,

3)     addition of Yi radical set.

Motion approved unanimously.


4.3.           Mongolian

4.3.1.      WG2 N1437 Report of 3rd Mongolian encoding meeting [96-102]

4.3.2.      WG2 N1438 Draft on encoding Mongolian character set [96-103]

4.3.3.      About the function of identifiers of Mongolian Proposal [96-113]

Jenkins: The proposal looks plausible on the surface, but Becker needs to provide input. Discuss on “Unicore” list.

Glenn: proposal introduces a identifier character to define the presentation form. We might not be able to avoid the introduction of this character, as the presentation is not machine decided. Could this character then be used with other scripts, like Arabic. More work will be needed in the definition of architectural impact of the identifier character. The numbers for Mongolian in the roadmap document assume use of identifier character, not presentation forms Ksar said that we do not expect a final vote in Singapore..

Guidance for Singapore: get out of the e-mail discussion - contribute!!!


Action for Adams: Summarize unicore discussion for Singapore.


4.4.           Uighur, Kazakh and Kirghiz

4.4.1.      Supplement Arabic with Uighur, Kazakh and Kirghiz [96-112]

Glenn: Presentation forms should not be allowed. They can be predicted by a presentation engine. The language and the font need to be known for correct prediction. Another font might need yet different presentation forms


Motion: Moved by Whistler, seconded by Sargent:

The UTC is not in favor of addition of the Arabic presentation forms as in 96-112 that are renderable by algorithm in accordance with the character/glyph model.

Approved unanimously.


4.5.           Runic

4.5.1.      WG2 N1417 2nd revised proposal for Runic character names [96-100]

Adams: Alternative names for runes were moved to Annex P. The Swedish NSB now wishes to remove them.

Freytag: Alternate names in Unicode should also be in 10646, if requested. Alternative names publishing is easier in Unicode.

Ksar: Annex P should not be used as repository for alternate names. Ken supports that.

Suignard: Annex P as last resort for agreements - we need sensible consensus first.

Uma: WG2 asks for contributions what goes into annex P, Unicode should contribute.


Motion, moved by Ksar, seconded by Whistler:

The UTC does not support adding the Runic alternate names to annex P.

Approved by consensus.

Consensus of UTC (including proxy) that Runic can go into the BMP. Use of Annex P is an exceptional procedure if no other compromise can be found.


4.6.           Romanian

4.6.1.      Romanian request - Suignard [96-109]

This is more a font issue than a coding issue. Romania wants to add characters to Latin-2, just for the sake of presentation. Addition of the characters would create a migration nightmare.

Carroll: the type industry wants to make characters look good. Comma and cedillas often look similar. Language and script code allows selection of different font, another possibility would be to use combining characters with cedilla and/or comma below.

The motion from UTC #69 opposing addition of these proposed characters was endorsed !

4.7.           Armenian

4.7.1.      Suignard

No input

5.                Current ballots

5.1.           JTC1 TAG question about JT/96-0432 [96-085]

CD or DIS ballots? European synchronization issue. Wait for JTC1 proposal.

5.2.           SC2 request for comments on IRG proposal for 6585 additional ideographic characters in the BMP [96-084, 96-086]

UTC and X3L2 support the inclusion in the BMP The Japanese NSB is reported to have said that some characters might not be legitimate and that only legitimate characters should get into the BMP, 6585 characters should be re-checked for legitimacy.

Kobayashi: said that the aim is to make the standard perfect, and do NOT add incorrect characters. This is the position of the Japanese TAG to SC2.


Motion, moved by Adams, seconded by Jenkins:

The UTC instructs its representatives for WG2 that the UTC position is to prefer encoding of Vertical Extension A on the BMP, but the liaisons should remain flexible,.

Approved unanimously.

Jenkins: We want them encoded NOW in 10646. Rather off the BMP now than on BMP in 1-2 years.

5.3.           DAM#5 Korean repertoire [96-079, 96-080]

5.4.           DAM#8 Unification CJK ideographs [96-081, 96-082]

5.5.           Endorsement of Mike Ksar as convenor of SC2/WG2 [96-096, 96-097]

5.6.           pDAM#9, Identifiers [96-095]

6.                Interaction with SC2

6.1.           Reports and documents from SC2

6.1.1.      Resolutions from SC2/WG2 meeting in Québec - [96-093]

Dealt with in September UTC meeting. U.S. position deferred to X3L2 meeting.

6.1.2.      Resolutions from SC2/WG3 meeting in Québec - [96-092]

6.1.3.      Resolutions from SC2 meeting in Québec - [96-091]

6.2.           Input for next WG2 meeting

6.2.1.      UTC positions for WG2 and IRG meetings

IRG positions:

1)     Position on Vertical Extension A (see above)

2)     New composition method proposal from Prof. Shih in Taiwan, who will be attending IRG.. Will be discussed in Singapore. UTC favors a standardized composition method, no preference for a specific one. Open issues are the levels and the position.


Adams asked about the Berkeley meeting with Prof. Lancaster and his colleagues in September. Whistler said that the attendees were aware of the issues, and have seen Vertical Extension A. Have a repertoire of 15K characters that are not in the URO. The 15 K have not been unified, nor have they been checked against Vertical Extension A.


Adams said that members have been concerned about Hong Kong characters, and asked whether the UTC should be collecting a US contribution.


Action for Jenkins: Contact point Prof. Lancaster re working on U.S. contribution to the IRG.


IRG should give UTC the fonts for distribution of information. Glenn has other source. The liaisons are entitled to ask for the fonts in Singapore.


Action for Jenkins: Draft letter from Consortium (for Davis to sign) to IRG re getting fonts for Vertical Extension characters.


Jenkins said that lack of characters from CNS 11643 was hindering acceptance of the Unicode Standard in Taiwan. Only planes 1-3 are in the URO.


Motion: moved by Jenkins, seconded by Uma:

The UTC favors addition of all unique characters from CNS 11643:1992 (all planes) to ISO 10646.

Approved unanimously.


Action item: Jenkins: Have Cora begin work on checking for unique characters in CNS 11643:1992.

6.2.2.      Unicode Consortium delegate to SC2/WG2 and IRG

Motion: Moved by Adams, seconded by Winkler:

That John Jenkins be the Unicode Consortium’s representative to the IRG.

Amendment by Freytag: That Jenkins be the Consortium’s primary representative, with Glenn Adams and Michael Kung as alternates.

Amended motion passed unanimously.

Glenn Adams and Asmus Freytag were appointed Unicode representatives to WG2 for the meeting in Singapore (by consensus).

6.2.3.      Ethiopic pDAM #10 text [96-119]

Action item (Adams): Revise pDAM to conform to WG2 instructions.


Motion: moved by Adams, seconded by Jenkins:

The proposed character Ethiopic Space which was present in WG2 N1420 should be removed upon further consideration, and should not appear in attachment A or B. Additional study indicated that this character should have been unified with SPACE, and that no “intrinsically Ethiopic” space is required.

Motion approved unanimously.


[ Mike Ksar later informed the Chair that he had made a motion to ensure that the pDAM text prepared for WG2 was exactly in accord with the original proposal to WG2 (i.e., including the Ethiopic space). Neither the Chair nor the Vice-Chair had recorded this motion. The need for it was eliminated when Glenn Adams agreed to revise the pDAM text to remove the additions that were not in accordance with WG2 instructions.]


Action item (Unicode representative on X3L2): When pDAM comes up for ballot, Ethiopic space should be requested for removal.


Action item (Unicode liaisons to WG2): Communicate this position informally in Singapore to other representatives.

6.2.4.      Response to WG2 action item list [96-121]

Action item: Joan and Mike Ksar to clarify liaison between TC46 and WG2 (Joan).


Discussion about “collection identifiers” after Hangul changes in 10646. Are they needed in Unicode, and if so, how? We need to be responsive to requests for collection identifiers.

Freytag proposed an ad hoc committee to come up with the Consortium’s position for the WG2 meeting. Ad Hoc Committee members to be Uma, Whistler, Suignard, Hiura.


Quad symbol:

Motion: moved Adams, seconded Jenkins:

The UTC accepts the change of APL quad symbol from 237B to 2395

Motion approved: 15 for, 1 abstention


Action item for Winkler: Distribute WG2 N1396 to X3L2 and UTC


Action item (Freytag and Adams): What should or should not go into annex P of 10646? Specify a proposal as Unicode reps to WG2.


Action item: Winkler to send WG2 N1416 to Ken Whistler

Action item: Winkler to distribute WG2 N1385 to X3L2 and UTC.

Action item for Adams: Response to proposal for adding special letters for Nigerian Yoruba (WG2 AI 31-2)

7.                Script and language codes

7.1.           Script codes

7.1.1.      Codes for scripts - proposal from M. Everson [96-088]

7.1.2.      Policy statement on script codes - Everson [96-087]

Whistler said the issue of script encoding was premature. Is a standard characterization needed? Postponed to next meeting.

7.2.           Language tagging

7.2.1.      Language tagging - Mark Leisher

Mark was unable to prepare anything.

ISO 639 is being revised. Mnemonics are language dependent. Will be discussed further in the next meeting.

8.                Internet issues

8.1.           Internet charset tags for Unicode (UTF-7, UTF-8) - Misha Wolf

We need a position that squares with reality. UTC (at meeting #70) voted to distinguish between versions. Vendors are not distinguishing between versions. Misha recommends a version independent Unicode tag.


Adams pointed out that there are two factors: identification of the encoding system and identification of the repertoire. “charset” in the MIME context = character encoding scheme.


Motion moved by Adams, seconded by Wolf:

UTC to register “UTF-8” and “UTF-7” with IANA and undo its September 1996 decision to register version 2.0 designators (action item 70-A39).

Motion passed by 2/3: 13 for, 2 opposed, 1 abstention


Coding system definition (UTF-8) needs no version as char-set-name (IETF term for character encoding schema). Version label is a different story. Too specific labeling can lead to rejection of simple English text, due to a change in Korean.

Uma: In the context of Internet traffic, using the latest version, “UTF-8” is not ambiguous.

Freytag: Need to work out a general policy on this. Reserve generic names for most up-to-date version, with distinctions for previous versions. Mariani pointed out the problem of using “UTF-8” as a generic when it is embedded in data.

8.2.           Report - IAB character set workshop - Chris Weider [96-107]

For information

9.                Unicode International Conferences

9.1.           Mainz, Germany - March 10-12, 1997

Hard work of a small bunch of volunteers. Misha asks that member companies put links to web pages into their networks.

9.2.           USA

San Jose, September 3-5, 1997. Misha is chairing the editorial board of the IUC.

9.3.           Japan

Tokyo, early December. Exact date still to be finalized.

10.           Old business

10.1.       Equivalence of combining characters -WG20 [96-089]

Unicode definition of equivalence will be used in WG20’s sorting standard.

11.           Other business

11.1.       WG20 request for character property tables [WG20 N498]

To be handled by the Officers.

11.2.       Unicode compression - Wolf, Whistler, Wicksteed, Davis [96-110]

All we can do is to define the process of how to progress. Wolf recommends to leave it on the Unicore discussion list a little bit longer. Empower Wolf to post the revised draft documents according to the W3C method (old, current, editors...)


Action item for Wolf: act as editor for a draft document with feedback from today and to 96-110, by January 31, 1997. Discussion on Unicore, also disposition of comments.


Action item for Adams: Create a “Working Paper” section on the Web site. install document on web site and create a section for comments


Action for Aliprand/Winkler: Withdraw 70-A14 action item.

11.3.       UNIX and Unicode - Gary Roberts

UNIX does not support Unicode to a great extent. Why this, and how do we alleviate this problem. A UNIX SIG would be of great value for all interested parties. Freytag suggested that Roberts form a SIG to discuss the issue and come up with recommendations.

Rannenberg: X/Open has strong recommendations for APIs.


Action for Roberts: Collect names of people interested in working in the SIG. (Mariani, Texin, Rannenberg, Kung, Hiura expressed interest.)


Action item for Roberts: create list of issues, report via “unicore” list or else at May UTC meeting.

11.4.       Apostrophe clarification - Mark Davis [96-108]

No discussion document provided.

11.5.       UTF-8 Hart [96-114]

Hart: What to do with undefined codes?

Freytag: standard does not define what should happen. For illegal input, the Unicode sample implementation will react in a compatible manner, the generic algorithm allows anything.

Ksar: Implementation guidelines can be followed, other implementations might do different things.

11.6.       Meaning of UTC votes on standards in context with WG2 work

Whistler: difficulty of keeping Unicode and 10646 standards in sync. Old model: UTC decides, lobbies WG2 for acceptance... New model of co-operation with X3L2 changes the situation somewhat, how should UTC work?

Freytag: Consensus through submitting to WG2 instead of accepting into Unicode. Tracking of status of proposals is a necessity. Pipeline ad-hoc group could possibly update the pipeline document in real time.

Adams: who drives whom? or cooperation.

Uma: both standards are quite stable, less confrontation occurs.

Ksar: things have changed to the better over the last few years. WG2 tracking mechanism is available, check if we can enhance it

Hart: go through WG2 has advantages of worldwide acceptance.

Asmus: Do we have the documented approval of our membership? We need or own tracking, especially if our feedback is not accepted by WG2. Ksar agrees.

Jenkins: what is the process and flow of proposals.


Action item Ksar: send out URL of Thygessen document.


Action item Uma, Ken, Jenkins: Draft UTC process flow additions to this document.

11.7.       Standards subset of Unihan (collection)

No comments

12.           Character glyph model

12.1.       Status - Hart, Griffee

Ed Hart reports about input from NB. Al Griffee and Ed Hart will meet on the weekend to resolve comments A disposition of comments will be distributed to UTC and X3L2.

13.           Review of recommendations to X3L2, and action items


14.           Closing of joint meeting

The Chair thanked NCR for hosting the meeting, and Gary Roberts making the arrangements.