From whistler@zarasun.Metaphor.COM Fri Apr 5 18:46:38 1991 Date: Fri, 5 Apr 91 18:35:44 PST From: whistler@zarasun.Metaphor.COM (Ken Whistler) To: unicore@Sun.COM Subject: UTC#46 minutes At UTC#46, Asmus printed and distributed the raw-draft minutes from the subcommittee meeting of 3/25 and from day 1 of the UTC meeting itself, 3/26. In the interest of completeness, I am distributing the raw text of the computer notes from day 2. No guarantees. This is raw material, no editing. Edited minutes appear later. --Ken Whistler Unicode Secretary. ======================================= Draft Minutes from the UTC meeting 3/27 at Apple ==== DRAFT ===== Roll Call --------- Not present: Ecological Linguistics Pacific Rim Decision to delay diacritics because of absence(s) of individual representatives. III. Technical issues (cont.) Misc. Proposals --------------- Lunate Sigma ------------ Proposed at U+03f2. It is a character used in byzantine Greek. It can be used for either regular or terminal sigma. Discussion focusses on the fact that the variation is not predictable and that to not code the character would force users of Byzantine Greeek to use and tag a separate font where only one (or two, incl terminal sigma) positions. Vote: 13 for, 1 agaist, none abstain. Documents passed out UTC-91-48 Minutes frm 3/26 UTC-91-49 Extended Hangul syllables UTC-91-50 Hastings comments on conformance UTC-91-51 updated member list UTC-91-52 attendence 3/26 UTC-91-53 Hebrew 03.26 Alternate Rho ------------- Alternate rho needed to map SGML and other technical uses of this variant. Proposal to add the alternate rho at U+03F1 Vote: for 13, 1 abstain, none opposed IPA additions ------------- IPA met in Kiel in 1989 and revised the standard and published the revisions in the Journal of the International Phonetic Association in early 1990. Proposed to match in Unicode the IPA as of the 1989 level. This means adding a small number of base forms and modifier letters. The addtional base forms are listed in UTC-91-47 from U+0299- 2ad. 1) small caps with defined phonetic usage 2) otherwise unique base forms (e.g. small letter cross tall j) 3) set of formally defined ligatures (e.g. ts) 4) small subset of Greek characters with defined phonetic usage different from Greek Set 4) is controversial since these characters are glyphicallly not distinct to the regular Greek. Names to be improved for U+0297,8 POPCORK and KISS to match IPA names. Potential problem for KISS which is bullseye for IPA and we already have a bullseye, albeit not a "LATIN SMALL LETTER BULLSEYE" Proposal to add the base forms as listed in UTC-91-47 from U+0299-2a9 Amendment to add the j hacek to the extended block, but not provide an uppercase form. To be put in U+01f0. Leave out the 4 greek characters Vote: 14 unanimous. 4 greek characters: proposed to shelve them, but with aliases to the greek alphabets. Vote: 13 for, 1 against, no abstentions. Proposal to add 10 further modifier letters shown from U+02de. 1) completed repertoire of superscritped modifier letter 2) a set of tone letter primitives. Ligatures may be formed when several occur in sequence. 3) Rhot. hook Vote: 14 unanimous. Scheinberg points out thast UTC's position on Hebrew is in sync with the proposed standard as in UTC-91-53. IPA: diacritics Propsosed to add nine diacritics U+0348-50 in UTC-91-41. This completes the IPA repertoire EXCEPT for two double diacritics LIGATURE MARK above and below. Bishop points out, Whistler concurs that we need to state cases which we concidered and shelved to distinguish this case from omissions. Vote: 14 unanimous. Handed out the draft minutes from 3/26 and call for corrections. Proposal to add: 1) Turkish dotted i to provide and explicit lower case form for Turkish LATIN CAPITAL LETTER I DOT U+0130. 2) Rhada barred capital B 3) dotless lower case j Discussion: Becker points out that 1) is not a complete solution, but would require to clone a capital dotless I. Scheinberg: This would solve upper casing round trip at the expense of a major headache for. Collins: we do have several forms of capital D BAR. Scheinberg: even for English Aa there is no roundtrip (you get AA or aa). Vote: 13 for, one against, one abstention. 2) and 3) retracted in favor of 2.0 wishlist. Glyph alternations for Czechoslovak letters. --------------------------------------------- There is a problem that the Unicode descriptions and charts are misleading as to moder Czeck and Slovak usage. Proposal: Explanatory material in two places: 1) A dual glyph indicates that the character codes encompass EITHER glyph. (general introduction) 2) The usage in Czeckoslovakia is as follows: D ap T ap are historic. The apostrophied forms are primary in book publication for L, t, l and d. Typewriters and handwriting use the other forms (with hacek). (European) 3) Reword the text in the name list to not indicate the primary form as a "variant" Vote: 15 unanimous Proposal to remove the alternate renderings for D and T and reverse the order of alternate glyphs for the other cases. Vote: for 6, against 6, abstained 3 3. 3.3 Page separation, double-hyphen, figure-dash all proposals withdrawn (double-dash has already been removed), requested information from companies on figure-dash semantics (it is assumed to be a figure-width (digit-width) dash. III.I.4 FF special character proposal (for indicating byte swaping). Word-perfect: realistically, would add this character at the beginning of text. Dec: this is the big-endian/little-endian issue: reason for having this is for communication, it is a heterogeneous world, and we have to deal with it. Lotus: if we don't add it now, we don't add it. Sun: could also indicate that it is Unicode and not ASCII. Others objected that it is confused. Vote: 14 in favor of adding, none against, 1 abstain. Document passed out: UTC-91-54 FF space proposal II. Techical issues (from previous day) H. Diacritics proposals Greek breathing marks Proposal from subcommittee 2) make U+0371,2 non-spacing add spacing clones for 0370, 0385 at 03F3 and 03F4 Vote: 15 unanimous Double diacritics Proposal 0) as is 1) from subcommittee: to remove for 1.0 a) 5 double diacritics 0339-033d b) Cyrillic 0484 and repair the holes in the chart Vote: 10 for, opposed 3, abstain 2 2) Add the "double diacritics" again, but with a different semantics: a) as is (keep them deferred) b) as single diacritics which overhang on the right Vote: 7 in favor, 7 against, 1 abstain c) as doubly overlapping zero width characters (retracted) 3) to remove the "piecewise" diacritics in U+0316-9. Vote: 14 for, none opposed, one abstain An editorial comment is needed to declare these as deferred. Kana ---- K0: as is K1: change U+309b,c to spacing, add U+3099,A as non-spacing clones. Consensus to adopt from SC minutes: Consensus: Unicode does not have a unique spelling of text elements. Level 1: A ACUTE = A + ACUTE Level 2: A ACUTE UNDERDOT = (all permutations) Level 3: Some people would prefer canonical spelling A ACUTE UNDERDOT != A UNDERDOT ACUTE Proposals: 0) informative material [10] 1) recommended order [5] 2) standard order (conformant) [2] Consensus is to beef up the informative material. (Suggestive Advice to programmers); Whistler noted that this is consistent with the decision made at UTC#45 that Unicode does not prefer composed forms over composite forms or vice versa. Proposal 0 as beefed up under Consensus is to be put to UTC #46. III. I.(cont'd) 5. Plans for an intermediate release of Unicode ("1.1") Motivation: we are finishing up and it is very important to get 1.0 out. But there are a number of issues which are very far along but still need some extra time. These issues, but not whole addition of new scripts, woudl be considered for what is essentially an update. This update is proposed for 4/Q 91. Proposal 1): Add general text that Unicode will generate supplements which will be available through Unicode Inc. if not through Addison Wesley. Add to the minutes the intent to do the following supplement: Name: Unicode 1.1 Date: Last Quarter 91 Format: Unicode 1.0 plus Unicode 1.1 supplement Vote: 14 for, none opposed, one abstention Informational points after the lunch breaks. E-Mail for the editor after 3/31: Erica@netcom.com Addison Wesley will give a 50% discount for orders in quantity. J. CJK ------ Documents handed out UTC-91-55 Disposition of CJK issues UTC-91-56 Indic Charts UTC-91-57 Indic names list UTC-91-58 Thai charts & names UTC-91-59 Anderson Dotted Charts UTC-91-60 Anderson Fundamental Problems UTC-91-61 Anderson BIDI Proposals from CJK subcommittee 1) Add characters a) Kaeriten from U+3190 to U+319f b) 6 IBM Korean Han Characters to Unihan level 2 c) CJK symbols from Adobe (some already voted on as part of Dingbats) *2) postpone consideration of YEN/YUAN to 2.0 3) Include CNS vertical forms in CZone Amended to empower SC (Joe Becker) to make the call *4) Defer decision on duplication of 213 radicals to 2.0 5) 19 IBM Kanji: add to Han CZone. *6) Designate liaison to JISC Lee Collins volunteers to attend the first meeting as UC primary liaison. Rick McGowan volunteers to be the alternate for the next following meeting. Starred items are not proposals, but SC recommendations. Voting on 1) Vote: 14 unanimous Voting on 5) amended to empower Collins to decide where they are placed. Vote: 13 in favor, none opposed, one abstention Voting on 3) amended to: include them where appropriate, as time permits, but only as a block (i.e. if not all are settled, do none). Vote: 12 in favor, one opposed, one abstain 8) Move Super and Subscripts to CZone (retracted) 9) Add editorial text to indicate characters that are mainly encoded to provide compatibility (This could best be done in the block intros, A.) and add mappings and decomposition tables as appropriate. Vote: For 13, opposed none, one abstention Hangul 1) Defer composition method until 1.1 2) Declare existing Jamos to be spacing 3) Defer decision on adding 1924 Vote: 13 in favor, none opposed, one abstain 4) Shift Han codes by Han codes by 2048, creating uassigned zone (or by 4096) Discussion focussed on the architectural nature of the question. Joe: shifting the HAN implies that the Hangul characters will be open. Issai: this does not imply that; these are for any additional characters. John: prefers 4 K, to allow more space for alphabetics. Issai: 4 k is better Friendly ammendment to make Han start at 5000K. The new block is to be unassigned. Issai accepts. Eric, Rick: we should open up the issue of adding Ken: we have a contiguous area of Han compatibility characters, we should move them also. Issai: accepts moving the Han compatibility characters. Stewart: should move all CJK up. Ken: that would take too much time to remap. Lee: only reason to more is to leave room for Hangul. Voting: 6 in favor, 4 against, and 4 abstaining. Motion fails (needed 2/3 vote) Proposal 5: Hangul UTC authorize the addition of 270 Han corporate characters, subject to decision from Issai. Joe: should use user-codes. Issai: this is not as useful. Joe: should add the Chinese Hangul characters as well, if we add the corporate characters. Asmus: see strong requirement of 270. I am split on the 1924. Would like to focus thinking: no implementation that depends on the 270 characters in this time frame before 1.1. Issai: companies may want to have the characters before 1.1. Asmus: Mostly dependent on US software, so Korean use would be critically dependent on US release. Issai: does MS depend on 1.0? Asmus: mostly 8859 languages. Issai: the 270 characters are important and implemented, so if someone wants to implement right away it is important. Avery: 270 are not going to be in the same order as Korean 1924. Joe: no production problems in adding the characters. Asmus: split proposal 5a: add the 270 characters in 1.1, codepoints currently unassigned. Seconded by Issai. Joe: is it assumed that if we add 1924 then they will be a part of that, otherwise they would be in a block at the end of the Hangul as document UTC1991-49. Issai: that is part of the proposal. Issai agrees to vote on 5a first. Voting: 11 for, 0 against, 1 abstain, 1 not present. Motion passed. Proposal 5: Issai ammends to a letter ballot. That is, in 1.0 allow Issai to decide on the addition of 270 Hangul with codepoints assigned as in document 49, by letter ballot by the end of April (following procedures agreed upon for BIDI). Voting: 1 for, 11 against, 0 abstain. Ken: clarification: these two proposals imply that there is no change for Hangul in 1.0. III. L. Walk In proposals ------------------------- 1.) Devanagari, Thai The layout of the Indic scripts is structured so that the the same elements appear on the same location. xxx Length Mark xxx AI length Mark xxx AU Length Mark have been added (as slots) in all the scripts in the same positions to simplify monotonic implementations for surrounds. Vowels U+090d, U+0911 get descriptive names. U+093c is invisible. Proposal is to encode the Indic scripts as indicated in UTC- 91-56. Amended to flip Begali 09f0,1, cllose Gurmukhi U+09f column and move up the Oriya U+0b71 to U+0b70. Vote: 13 for, opposed 1, 1 not present Thai, Lao The encoding is an image of the Thai industrial standard, in addition for implementations of Phonetic order Thai five additional phonetic order vowel signs are added (and shown with square circles, since they are not really diacritics) and named as such. Proposal to encode Thai and Lao as in UTC-91-58, amended to move phonetic order vowel signs over to a higher numbered column. Vote: 14 in favor, one abstention 2.) (there is no # 2) 3.) Arabic U+06b8 LAM with 3 dots below, U+06b9 LAM with 3 dots below and small v. are currently "unknown". Proposed to remove for 1.0 pending further investigation. Vote: 11 in favor, one opposed, 3 abstentions Other "unknowns" in the Review draft could be verified by RLG and will be identified in 1.0. 4) Arabic Proposal to drop U+06bf and wishlisted. Vote: 10 in favor, one opposed, 3 abstained 5) Hebrew Recent Israeli communication indicates a mismtch on the cantillation marks. They are actively working on these issues in Israeli standards efforts. Cantillation marks, unlike vowel marks, present rare usage. Proposal to defer the cantillation marks. Amended to explain that they are deferred for purpose of synchronisation with Israeli standards. Vote: 15 unanimous 6) 14 addtional Arabic spacing diacritics Proposal to lay them into colum U+fe7 Vote: 13 in favor, 2 opposed, no abstentions 7) Changing Arabic names Anas: Institute of Arabic & Islamic Sciences in America agrees that official names should be used rather than colloquial names. Joe: official names, does that imply an official transliterations. Joan: I have a translation of ASMO, signed by the sectetary, using the official names. this is basing the english on a particular pronuncition. Anas: in fairness, there is an argument against this: the Farsi, etc use something more like the colloquial (Apple) names (e.g. beh instead of baa). Issai: we discussed this in subcommittee, and ASMO didn't overrule this in Amman (10646). Joan: The rule in 10646 is to use the earliest official international standard, which does use the classical names. There is a list Motion to adopt the classic names as in Appendix A: legend for Arabic Alphabetical Characters (dropping the apostrophe). Vote: 10 for, 3 against, 2 abstain 7 Add Roman S cedilla, Roman S comma to extended Latin Lloyd: these forms are used in Turkey & Romanian, and need to be in the proper form in plaintext. Rick: it is possible to distinguish between these by using explicit diacritics. Lloyd: Europe is supposed to be covered by composed characters. Rick, Joe: the existing standards do not distinguish between these characters. Asmus: do get into problem when existing standards are ambiguous. Would need to add both of the other forms to distinguish the ambiguity. Eric: lloyd, are they typing in the same words in plaintext, with no structure. Same as CJK, only minor variants are not distinguished, language is not part of the spec. Bill: In written Romanian it is always written with an comma: the names in both ISO standards are the same. Avery: am sympathetic, but we can add the unambiguous characters later. Glen: we just added Vietnamese-specific marks. Voting 1 for, 8 against, 6 abstain. Offline: Bur edited the book cover, and suggested it for Addison Wesley Propsal 9: Take .5K from the User space and give it to the caompatability zone (Isai). Rick: the czone started with a few chars which has continued to grow. Joe: czone was designed not to grow. Eric: this promote impure though - vote it down. Isai:...more... Rick: Companies have known about this for more than for than a year. Joan: some of HPs will overlap with the IBM characters. Mark: they can use the IBM characters with their tails. Vote: 2 for, 10 against, 1 abstain Added two issues by Lloyd 10. bidi proposal Lloyd: recommends additions to the BIDI algorithm. Resolution: subcommittee 11. Unicode 1.0 schedule Lloyd: tremendous appreciate for work so far, but frustrated by difference in world view, in people that appreciate the structure of language, and those that don't. Is enormous imposition on editors, but there are a lot of indications of afterthoughts. Example: circled number zero. This will not help implementors. There are a lot of Q&A documents, etc. that need to be prepared: Things discoverd in editorial text are real problems e.g. diacritics vs. indic matras, e.g. diacritics vs. floating. UTC1991-60 details these. Don't think that it is ready. Proposal: Make the meeting on the 30th a UTC meeting, and not just an editorial meeting, limited to changing code positions. Rick: the finished product should not show the results of moving characters arbitratily, however, we should not take the next 6 months to do this. At this point, it is too late. Personally, I would be willing to reposition all the codes. Avery: This is not popular, but we should delay 1.0. I would agree with that. I want a 2-3 month delay. Not enough time to get feedback from national bodies. Asmus: would put all the a's on the first page, etc. but in so many instances we have imposed ordering because of not-so-professional standards (e.g. 8859/1). Want something that is very clean. However, as a realist, we have made certain decisions that burden us with other choices. Even if we are very good, there will be additional characters that don't fit. If there are any few characters that could be moved, that is ok. However, for MS, will not delay by even 24 hours. Mark: reordering will not get us much (longer diatribe). Joan: we would have to revisit more than code positions. We should then revisit the garbage character issues, if we do that. Anas: we should defer anythings that have a lot of argument to 1.1. Issai: we are missing the point of Lloyd's argument. He complemented us. His point was more to make distinctions and important clarifications of character properties. Avery: needs to be done right, proposal to delay it by 3 months. Lloyd: does not support that. Cantellation marks are a good case of something that is reasonably delayed. What I am talking about are cases like diacriticals. When Asmus said that diacritics were semantic, that showed a problem. We should get the editorial text clarified. Rick: we need to be clear on diacritics vs. outlines. We discussed changing the images. I volunteering the time. Mark: we agreed to make the distinctions in the text. The pictures can be changed in the future. Ken: we can fix the pictures. Kern: (speaking as metaphor): when do we put a stake in the ground. We almost closed the spec several times. We have had several important delays, to get more feedback. I don't see any additional milestone that is worth delaying them. Eric: seconded proposal, we are on the edge. We are grappling with the proposal, clear unambiguous interpretation. We will be unable to convince others to support this. Asmus: changes are three kinds: relaxed principles (exclusions such as compatibility zone). Oversights (missing symbols), remove things that were perfectly reaonsable, because of slight uncertainties. Am afraid that the situation where we are making all three kinds of changes. The more times you ask a question, the more perfectly reasonable things we see. There is a benefit to taking the plunge. We have a lot of "equally reasonable" decisions Proposal 12: Delay Unicode by 3 months Vote: 5 for, 9 against, 0 abstain. Issai: Lloyd can have a letter ballot; a friendly amendment. Lloyd: The issues that I raised have been available, and no one has addressed them. Proposal: Have a UTC meeting at the end of the month. Vote: 2 for, 7 opposed, 5 abstaining Motion fails. Digital, Lotus states unequicably that the motion for the proposal was in no way a sabotage of Unicode. Editorial meetings: 1. on 12th 2. on 30th At RLG Next UTC meeting for June 7th. Issai: suggest 2 day meeting Also, invite Chinese, Korean to UTC meeting, to be held on May (WG2 is 13-17). Should be on the 20th if wg2 is a short meeting. Opportunity to invite people outside America to participate in UTC. Ken: Have the UTC direct the CJK subcommittee to invite them to the CJK meeting on the 17th.Move the editorial meeting to the 20th. Joe: Very important to attend the April 19th ANSI meeting.