Minutes from UTC/L2 in Palo Alto

L2/99-054

May 25, 1999

Preliminary Minutes – UTC #79 & NCITS Subgroup L2 # 176 Joint Meeting Palo Alto, CA – February 3-5, 1999 Hosted by Hewlett-Packard Company

Chair Aliprand convened the joint meeting of the UTC and L2 (L2 Ad Hoc) on Wednesday, February 3, 1999.

Administrative Items

UTC Membership Roll Call -- See Attachment 1 for list of Attendees

Lloyd Honomichl notified the Consortium that Novell membership is in abeyance because of reorganization.

Apple Computer, Inc. and Unisys Corporation presented proof that their membership fees had been authorized for payment.

Number of members in good standing as of this meeting = 15. Quorum = 8.

PRESENT: Apple Computer, Inc.; Hewlett-Packard Company; IBM Corporation; Justsystem Corporation; Oracle Corporation; The Research Libraries Group, Inc.; Sun Microsystems, Inc.; Unisys Corporation; Xerox Corporation

(Total members represented: 9)

NOT PRESENT (at time of roll-call): Booz, Allen, Hamilton, Inc.; Compaq Computer Corporation; Microsoft Corporation; Reuters, Ltd.; SAP AG; Sybase, Inc.;

(Total not represented: 6)

Sybase representative arrived after roll-call: 10 members represented.

Technical Review of The Unicode Standard, Version 3.0

[Document UTC 1999-004, 4Unicode 3.0 book comments, from Paul Hoffman]

Paul Hoffman, of the Internet Mail Consortium, spoke about his comments. Users complain about "too many character sets in their mail." There is interest in UTF-8 because of the IETF recommendation. W3C prefers UTF-16 for XML, UTF-8 OK.

Encoding Forms and Encoding Schemes

Hoffman asked about the overlap of the names UTF-8 and UTF-16. Davis confirmed that UTF-8 is an encoding form and also an encoding scheme. UTF-16 is also both and there are also UTF-16BE & LE encoding schemes.

Action Item 79-1 for Freytag: Write proposed text to cover these points resulting from the discussion with Hoffman:

UTF-8 is encoding scheme of UTF-8 encoding form
UTF-16BE, UTF-16-LE and ambiguous "UTF-16" (no designation) are encoding schemes of UTF-16 encoding form.

Hoffman: Is a scheme the same as a transformation? Davis: A transformation is a type of scheme. What other transformations are there? Whistler drew an analogy with 8-bit sets. The key part is that for a given CCS, there is >1 way to represent their serialized form. Davis commented that you could consider EBCDIC to be a transformation of ASCII. Umamaheswaran said that it should be the other way around, since EBCDIC was earlier.

10:20 am: Arnold Winkler presented a proxy appointing him to represent SAP AG. 11 members represented.

10:35 a.m.: Murray Sargent, representing Microsoft Corporation, arrived. 12 members represented.

Surrogates

Hoffman said that IETF assumes extensibility from Day One. Danger of implementations unready for surrogates.

Freytag said that, except for occasional breaking of a pair due to random truncation, you do not need to do anything special for surrogates. We don't foresee a lot of market pressure to implement them.

Hoffman said that pairing for future expansion should be planned for even though you can ignore them for now.

Freytag responded that a clever implementation of international text works on strings anyhow. If you do this, surrogates bypass you "magically" except for a small number of cases.

Davis, Hiura, and Hart commented on their examination of Java with respect to surrogates. One thing affected is properties. You have to change the API a little, know the size of the "thing in string." The effect of this is that you can then deal with other "clumps."

Hiura said he shares the same concerns as Hoffman. The UTC should give a stronger recommendation to implement surrogates. The assertion that surrogates will be used only for "extremely rare characters" is not true in the East Asian context, and the statement may not be agreed to by the IRG. For example, some of the ideographs required for Hong Kong by the government are included in CJK Extension B.

It was suggested that language tagging supported via characters on Plane 14 will assist text to speech conversion. There was a difference of opinion on how useful Plane 14 tagging will be for this.

Byte Order Mark

Rewrite the need of the Byte Order Mark (p. 227, 3^rd paragraph). Take into account digital signatures (and other counting of number of bytes), but do not focus entirely on this.

Action Item 79-2 for Allen: Send Davis the most current version of page 277 (where description of byte order mark appears) in plain text form.

Action Item 79-3 for Davis: Draft new wording for whole of page 277 to apply just to UTF-16, not to UTF-16BE or UTF-16LE

Action Item 79-4 for Davis: Request Paul Hoffman to review new text on UTF-16.

Hoffman noted that the IETF character sets mail list is discussing the UTF-16 draft. He will post a notice to "unicore", most likely next week.

Meeting resumed at 1:10 p.m.

10 members present and 1 proxy. (Hewlett-Packard representative temporarily absent)

Soft Space

[Document L2/99-037]

This character was proposed at the WG2 meeting in London to meet a need for justification when Khmer text is in tables.

Becker said that a recommendation on how to use this character was lacking. Davis: Soft space can be indefinitely expanded for justification, but space is what is normally used. Freytag said he knows of an actual case in English where the space character had limits. McGowan said it was premature to encode this character because neither the need for it nor its semantics have been sufficiently explained. Becker noted that this is a format/layout character

Moved by Davis, seconded by Becker

[#79-M1] Motion: After further consideration, the UTC stands by its previous decision not to encode SOFT SPACE at this time. We now understand that SOFT SPACE should not be unified with ZERO-WIDTH SPACE, but we do not have enough information to conclude that SPACE and ZERO-WIDTH SPACE used together are insufficient for Khmer. We do not have a specification for behavior of the SOFT SPACE character.

Unanimous

Motion approved.

Myanmar Vowel Sign E

[Document L2/99-036]

Davis, playing devil's advocate, asked: Why not follow the Thai model?

Whistler said there was nothing on record showing that this character was discussed at WG2 in London. He has received no answer from Myanmar. Aspects of Myanmar implementation suggest this might be O.K. but this is a presumption without specific information from Myanmar.

Moved by Becker, seconded by McGowan

[#79-M2] Motion: The UTC accepts encoding in logical order as per document L2/99-036 on the Myanmar VOWEL SIGN E.

10 for; 0 against; 1 abstention (SAP)

Motion approved.

It was noted that this decision has no impact on WG2.

Action Item 79-5 for Aliprand or Whistler: Send a formal letter from the UTC to the MITSC to request their opinion on the UTC decision re Myanmar VOWEL SIGN E.

Action Item 79-6 for Whistler: Send Aliprand the address for MITSC.

Hewlett-Packard representative returned.

Eyelash RA

[Document L2/99-026]

Comment submitted by James Agenbroad

Davis said that the ZWJ should not be significant in sorting, but the eyelash RA (currently represented using ZWJ) and the regular RA sort differently in Marathi. Agenbroad’s proposal does not agree with ISCII.

Moved by Davis, seconded by Becker

[#79-M3] Motion: That the UTC follow ISCII practice, treating the eyelash RA as the half-form of RRA. Retain the old behavior using the ZERO-WIDTH JOINER for compatibility.

8 for; 0 against, 4 abstentions

Motion approved.

Action Item 79-7 for Editorial Committee: With the proviso that existing behavior is to be retained, rewrite the text on rendering of Eyelash RA to address items 1 & 2 in document L2/99-026. Items 3 & 4 are to be treated as input to the Editorial Committee.

Checklist for a Procurement

[Document UTC/1999-003, Additional Requirement for Unicode 3.0 Conformance]

Davis commented on the 4th bulleted topic, which includes scope of support for character properties. Considerations here include support for BiDi if Arabic is displayed. There is also fullness versus partial support issue.

Action Item 79-8 for Editorial Committee: Clarify what it means to "support" characters. Augment 2.9 to show hierarchical support.

Action Item 79-9 for Davis: Supply draft text for AI 79-8.

Action Item 79-10 for all: Send examples that could be used in the Checklist for a Procurement document to Editorial Committee.

Action Item 79-11 for all: Send comments on UTC/1999-003 to Hart.

U+231B HOURGLASS

Freytag reported on the problem. The glyph for U+231B in Version 1.0 is a double triangle (point to point) which is a symbol for "CANCEL." The glyph in Version 2.0 is a picture of an hourglass.

Whistler withdrew his proposal to revert to the Version 1.0 glyph for this character. Freytag has not received objection from reviewers of the 10646 second edition code charts. Ksar said that it is important to point out discrepancies to reviewers. Davis said he would have no objection to adding another character to preserve mapping.

Moved by Davis, seconded by Freytag

Motion: Do not bring it up in Fukuoka, but the UTC would support adding the double triangle character if needed for ISO 2047 mapping (if it comes up in Fukuoka).

[See below for voting]

Freytag said that he would submit a personal expert contribution to encode the double triangle form.

Moore said that there were more hourglasses in the math proposal. Whistler said there was a need for care because changes can invalidate previous mappings. Freytag said he would modify his expert contribution to say that U+231B HOURGLASS should be used for now, until there is a math character.

Voting on the motion re support for double triangle character:

3 for; 3 against; 6 abstentions

Motion failed.

Whistler commented that no action is needed at this time. If the glyph question is raised, the response to error in mapping for ISO 2047 should be to report on the anticipated character in the math proposal.

Greek Letter Koppa

[Document L2/99-018]

The Greek letter koppa has two different forms, one of which is primarily used to represent a numerical value.

By consensus: In-principle, the UTC supports disunification of koppa for addition to ISO/IEC 10646 after the second edition.

Action Item 79-12 for Freytag: Draft a joint Unicode/US summary proposal for upper and lower case forms of a new character, SIGMOID KOPPA.

Action Item 79-13 for Aliprand: Bring a copy of the TC 46 standard for Greek (which includes koppa) to Thursday’s meeting.

Action Item 79-14 for Winkler and Suignard: Prepare text for US response to Amd. 30 ballot. Note that the shape of U+03DE GREEK LETTER KOPPA is wrong. It should be the bowl form.

Armenian

[Document L2/98-426]

The UTC expressed interest in encoding the Eternity sign for post 3.0. The UTC felt that there is a need for the Armenian NSB to review ISO/IEC 10646 and its AMDS. For signs that the Armenian NSB considers missing, much more information about usage of these characters needs to be provided.

Action Item 79-15 for Winkler and Aliprand: Prepare US response on Armenian for submission to WG2.

Freytag has a question about the width of the zero-width space: Does it ever expand? Sargent said the name suggests that it won’t ever get expanded. Davis said that the soft space discussion just raised this issue. The usage of ZWSP in Thai implementations is not known.

THURSDAY, FEBRUARY 4

PRESENT: Apple Computer, Inc.; IBM Corporation; Justsystem Corporation; The Research Libraries Group, Inc.; Sun Microsystems, Inc.; Sybase, Inc.; Unisys Corporation; Xerox Corporation

BY PROXY: SAP AG

(Total members represented: 9)

NOT PRESENT (at time of roll-call): Booz, Allen, Hamilton, Inc.; Compaq Computer Corporation; Hewlett-Packard Company; Microsoft Corporation; Oracle Corporation; Reuters, Ltd.

(Total not represented: 6)

Representatives of Hewlett-Packard Company, Microsoft Corporation, and Oracle Corporation arrived after roll-call: 12 members represented.

Approval of the Minutes of the previous joint meeting and review of Action Items was deferred.

Changes to Unicode Data

[Document L2/99-039]

Whistler led the discussion. It was an oversight to not include the Version 1.0 names that had been used for Tibetan characters. Davis asked: Will Tibetan names be clarificatory comments? Whistler said that some will. Aliprand asked whether these names will be included in the names index. Freytag suggested checking what should be in this.

A suggestion for the Editorial Committee was noted: To advise users that the CD ROM database can be used as a last resort aid to find characters by name.

Moved by Davis, seconded by Moore

[#79-M5] Motion: That the UTC accepts the changes to UnicodeData specified in document L2/99-039.

Unanimous

Motion approved.

Davis asked: When looking at variant letterforms for Greek, why aren't these compatibility variants? Whistler said that it is because they have always been that way, from 1.1.5. Freytag said that some Greek variations are in independent use as technical symbols, so there would have to be font/symbol designation. The character name shows relationship.

Davis said that there are acceptable font variants in the Greek language context. The Greek national standards' body opposed their addition, but distinctions are needed for technical use. Would not propose to use <font> to make the distinction; <compat> would be preferable.

Davis said <compat> should be used where, in some instances, the form can be a free variation. Some environments use one form; e.g. U+03D1 is theta used as a technical symbol. Theta appears in two forms: U+03B8 and U+03D1. Freytag added the case of U+00B5 versus U+03BC, MICRO SIGN vs. Greek mu. Davis noted that this second case is treated as <compat>. Freytag said the distinction is important for transcoding.

McGowan: Want to do <compat> for cases where compatibility character would not have been encoded. Technical symbols would have been encoded, so should not be regarded as <compat>. Freytag said that which one you pick can be damn important in a mathematical context.

Whistler: The effect is most apparent in collation. To make these work as people expect in collation, they should collate together. To make this work, we need to give them the <compat> designation. Freytag said he was satisfied. If collation was the only reason for making the distinction, we should designate a new type "collate." However, <compat> aids fuzzy transcoding with multiple sets.

Davis noted that if you translate to <compat> forms you lose data. This is documented in the book. He does not know if it is used a fallback mechanism.

Moved by Davis, seconded by Carroll, amended by Whistler (accepted)

[#79-M6] Motion: To give compatibility decompositions with the label <compat> to U+03D0, 03D1, 03D2, 03D5, 03D6, 03F0, 03F1, 03F2.

9 for; 0 against; 2 abstentions

Motion approved.

Action Item 79-16 for Whistler: Add the compatibility decompositions for the characters U+03D0, 03D1, 03D2, 03D5, 03D6, 03F0, 03F1, 03F2 to the UnicodeData database.

Whistler said that the Read-Me file in Unicode Character Database is inconsistent with what is in Version 3.0. If case is a normative property, then subclass property should also be normative. This is not case mapping, but a general property. Moore agreed that this is oversight.

Freytag asked whether the Read-Me is normative or an annotation to data file? McGowan said that it is an annotation. But whistler pointed out that in some cases only source of information.

Freytag proposed that we also say "this file contains normative information." We can't pin stuff down too firmly in the book because we could change the structure of the file at future date. Davis suggested renaming the "Read-Me" file because it is not a normal Read-Me file. It could be called "Data Structure," for example. Whistler objected because it would require too many changes.

Freytag said that we have legal disclaimers but do not say this file is an integral part of Unicode character database. When it comes to particular field we are trying to fix, the Editorial Committee needs to improve the description of "normative."

Action Item 79-17 for Editorial Committee: (As summarized by Freytag:)

1. Add language to Unicode Character Database Read Me file to say that this file is an integral part of Unicode character database

2. Make corrections to Read Me file to show that Lu, Lt, Ls categories are normative.

3. Review text in book to make sure that the normative properties are correctly defined.

Action Item 79-18 for Whistler: Change Unicode Character Database Read Me to incorporate Editorial Committee text.

Proposed BiDi Changes

[Document L2/99-038]

Editorial issues are not covered in L2/99-038 e.g. to clarify between examples, informative and normative parts.

Many of the proposed normative items have no effect on the algorithm itself. Old rules were based on looking at characters. Item A specifies looking at properties instead.

Freytag said that Martin Durst sent the proposed changes to W3C 118N group and got feedback.

Freytag said that the draft implementation of the BiDi algorithm has been updated, and he can confirm the changes have no great effect. Two minor edge cases had to do with empty embeddings. In large part, it is stable; the new work was fine-tuning relative to in-stream and external embedding codes. It took about twice as long as expected to implement sample code. Some of the changes will facilitate the process of implementation, and indicate that you are on the right track.

Whistler agreed with Freytag’s comments about implementability. He would like this document changed into a formal proposal, with a clear distinction made between normative and editorial changes. "Normative" in this context is determined by the output of BiDi algorithm.

Moore pointed out that there are two levels of normativeness: (a) to the algorithm, and (b) to properties.

Whistler said that we must be clear when we are making normative changes to data.

Freytag said that the Version 2.0 algorithm cannot be used with Version 3.0 data and we need to make this clear.

Davis suggested that an alternative would be to set a preliminary rule to cover changes in character properties. Would have same effect in algorithm; would minimize data table changes (only "c" would be needed).

Whistler felt this would not be the right way, but agreed that we should make changes in data table. This must be a proposal, and the proposal must state which version of data table is to be changed. Davis said the version should be 3.0. Whistler: Then the proposal should state that the changes apply to Version 3.0. Freytag’s suggestion to indicate which parts are normative in the revision of L2/99-038 was accepted.

McGowan said that it would be premature to adopt these changes at this meeting. Moore pointed out that most of the implementers were at the BiDi ad hoc meeting. McGowan concern was about properties, He wanted to see a list of all characters that are affected. He was not as concerned about changes to the algorithm. Freytag said that a revised document with amendments would be available tomorrow. McGowan explained that he wanted a precise list so there will be no unexpected effect on publication of Version 3.0.

Texin expressed his concerns: Most of us have to accept text from different sources. If we compare the 2.1 and 3.0 algorithms, what is typical error from 2.1 to 3.0 and what is the worst case? Davis said that neutrals between numbers will behave differently.

Freytag: Discussed the proposed changes and the effect of embeddings with examples.

Whistler pointed out that "HL" is not defined. McGowan said re item B(c) that he does not think we can make normative statements about unassigned characters. Freytag said that a strong recommendation on this would be OK personally. Whistler said it would be OK to build this into the BiDi algorithm, but it should be labeled as a caveat with the justification that it is intended to make implementations consistent. There is an effect on Chapter 4 also.

In item G, there needs to be an explanatory note to answer the question "Why 61?" Ksar said that W3C asked about this.

Texin asked whether unassigned characters will be included in properties database (to indicate how to treat them for BiDi). It was agreed that unassigned characters will not be included.

Whistler said that other than the changes that had been noted, he is delighted by this work.

Freytag brought up two small issues:

1. Ignore empty embeddings for base level.

2. Fine point re numerics: BN between separator and numbers.

Moore supports this proposal, as it is the simplest fix.

Action Item 79-19 for Davis: Revise BiDi report for Friday.

Revised Proposals for Arabic Characters

Mansour and Davis withdrew this agenda topic.

Glagolitic Script

[Document L2/99-012]

Due to shortness of time, McGowan withdrew discussion of all stable category-1 script proposals except Glagolitic, and also preliminary discussion of architectural issues re scholarly scripts of scholarly importance

Moved by McGowan, seconded by Becker

[#79-M7] Motion: That the UTC accept Glagolitic script (as specified in document L2/99-012) for encoding post 3.0 with suggested code points as in the Road Map (i.e., beginning at 1D80), except for the upper and lower case forms of the character JO, which must be rotated to the end, following IZHICA.

9 for; 0 against; 3 abstentions (Sybase, Hewlett-Packard, SAP)

Motion approved.

Collation

[Documents L2/99-042, Draft UTR #10, Collation and L2/98-381, FCD 14651, International String Ordering]

E-mail discussion with the 14651 people. Aim is to ensure that common requirements are conformant to 14651. Tricky issues, e.g. French accents with non-French text. Limit requirements for conformance. Boundary neutrals need more clarification. Level 4 weights under discussion, aiming for smaller sort keys.

Whistler: Updating default data tables. Re14651: L2 is to vote. Need to provide technical input.

Action Item 79-20 for Davis: Draft US comments on 2^nd FCD 14651 for L2 consideration.

Action Item 79-21 for Davis & Whistler: Amend "Status of this document" to indicate additions to UTR #10 since approval by the UTC.

Agreed by consensus that this policy could apply generally to all draft UTRs, as this will move them forward. The Ad Hoc sub-committee which "owns" the UTR will be responsible for updating the status information.

Whistler has received no comments on 14651. Texin offered to send some. Winkler noted that 12 March is the closing date for ballot comments.

Action Item 79-22 for Texin: Send comments on 2^nd FCD 14651 to Whistler.

Whistler (in reply to a question from Ksar) said that the 2nd FCD has changed in response to US comments.

Davis said that the FCD has undergone many changes, but still needs to be changed. Specifically:

The conformance clause is unclear, and needs to be restricted to a specific number of levels.
The FCD requires that a conformant implementation handle position value (a POSIX feature); it should just allow it.
The backwards field marker does not work. Some characters are shared between scripts.

McGowan felt that level should be forward or backward.

Whistler: Realistically, SC22 does not want successive ballots without results. The aim should be to get 14651 correct enough so that a Unicode implementation can be conformant in some way. We also want to preserve parts we don't want changed, e.g., BNF Syntax.

Proposed Draft UTR #13, Unicode Newline Guidelines

Comments have not yet been incorporated. Winkler (as proxy for SAP) reported that SAP has no additional comments and would like to see it published.

Moved by Davis, seconded by McGowan

[#79-M8] Motion: The UTC authorizes publication of Newline Guidelines as Unicode Technical Report #13 after all the comments have been incorporated.

11 for; 0 against; 1 abstention (Xerox)

Motion approved.

Action Item 79-23 for Davis: Ask Moore to review text of UTR#13, Newline Guidelines, before publication.

Proposed Draft UTR #15, Unicode Normalization

[Documents L2/98-404, and L2/99-044 (comments from Martin Dürst)]

Comments from Dürst are mostly editorial. There was a significant question re concatenation.

McGowan: We need to say that the canonical form of concatenated strings is undefined. Freytag added: But it is triggered by identifiable condition (string has initial non-spacing mark).

Ksar said that in the W3C meeting, discussion was also about URLs and URLs using Unicode characters. ECMA Script makes assumption that all strings are normalized.

(3) has problems, would not be canonically equivalent. Either fix (3) or make (4) more understandable.

Moore suggested an "executive intro" for the TR.

Action Item 79-24 for Moore: Work with Davis on "Executive Introduction" for draft UTR #15, Unicode Normalization.

Freytag suggested stating the benefits of (4) higher up. Winkler conveyed comments from SAP on lack of definitions, and the need for better short names.

Moved by Davis, seconded by Yang

[#79-M9] Motion: To remove compatibility mapping from characters U+1100 through U+11F9 (hangul jamo block).

Motion approved by consensus.

Action Item 79-25 for Whistler: Implement compatibility mapping change to character of the hangul jamo block in data file.

Whistler said that this change is large. He recommended a 2.1.9 update, noting that collation work has to be driven off a 2.1.x version.

Moved by Freytag, seconded by Davis

[#79-M10] Motion: That the UTC authorize creation of a new minor version of the Unicode database, designated version 2.1.9.

Motion approved by consensus.

Whistler said he will roll in all property changes from this meeting EXCEPT BiDi property changes. (He may include correction of errors identified through the BiDi work.)

Action Item 79-26 for Whistler: Create new minor version 2.1.9 of the Unicode character database, incorporating all property changes from this meeting except those relating to BiDi.

Action Item 79-27 for Davis: Respond to Dürst about Unicode Consortium policy on patents, which is: Any proposal that requires use of a patent will either not be adopted by UTC or the proposer must provide clearance.

Action Item 79-28 for Officers:

Update the Consortium’s policy on patents to add the following points:

1. Don't adopt technologies that are under patent claim

2. Need to close loophole re-unknown patent

3. No requirement to do patent search

Members should not make a proposal that requires a patent technique unless they provide clearance at this point for standardization and future free use.

Whistler said that feedback on Hebrew is to decompose. Becker asked about Arabic. Whistler said that as of 2.1.9, there were very few cases. A solution is required for Hebrew.

Action Item 79-29 for Aliprand: Put normalization of Hebrew on the agenda item for the June meeting.

Proposed Draft UTR #16, UTF-EBCDIC

[Document L2/99-034R]

Moore commented that "UTF" in the title is not a good name. Davis said that it is still algorithmic, but it just happens to use a table. Whistler said: When you end up, you have series of bytes that convey the same information as the source bytes. Umamaheswaran pointed out that it is also is fully reversible.

Davis recommended dropping normalization (p.6). Winkler said that SAP also had this comment.

Yang made recommendations about reworking of Table 3.

Action Item 79-30 for Umamaheswaran & Yang: Get together to discuss organization of Table 3 in Proposed Draft UTR #16, UTF-EBCDIC.

Action Item 79-31 for Umamaheswaran: Respond to comments from SAP on PDTR #16.

Action Item 79-32 for Umamaheswaran: Revise PDTR #16 to incorporate comments.

C1 Controls

[Document L2/99-048]

Umamaheswaran brought up the treatment of the C1 control space in the Unicode Standard.

Freytag said that we have been partially inconsistent in our treatment. We could treat the values as ambiguous, or treat as "not characters." He expressed concern if slowly but surely control code use is only for "official uses."

Whistler: said that C0 & C1 are counted as 65. The inconsistency is in assignment of names to C0 range, but not to C1.

McGowan was opposed to addition of definitions for the C1 range without a specification.

Davis said that listing names for the C0 range suggests that these are encoded as Unicode characters. However, the names are useful for usability. He suggested an informative note to link to "code in box." Freytag pointed out that in ISO/IEC 10646, the C0 and C1 values are outside the standard entirely.

Whistler said that a wide part of the industry has a common understanding of C0 use. Umamaheswaran said that his concern was about use of C1 use to encode graphic characters.

Freytag asked whether we recognize these code values as characters. Davis replied that all of them are in the Unicode character database. Freytag asked: Then what are the semantics? Davis responded that his reading of Umamaheswaran's document is not to add new characters, but just to add information in the charts. Except for TAB, CR, LF, he suggested putting "SOH in ISO 6429." The code points may have other semantics in other protocols.

Freytag said that this proposal would add 2 pages to the book, and would force C1 away from opposite facing page. (Whistler disagreed with this.). Putting the names in angle brackets would be a change to the code chart, although would be appropriate for consistency. Becker asked how TAB, CR, LF would be treated. Ksar pointed out that there are no names for the C0 and C1 characters in 10646.

The following options were considered:

Do Nothing (status quo).-- C0 not same as C1
Whistler proposal: C0 and C1 treated the same. (Image as dotted square with ISO 6429 abbreviation, name =<control>; annotation to identify function in other standards, e.g. Line Feed in ISO 6429)
3. Freytag proposal: C0, C1 treated the same (Image as dotted square with ISO 6429 abbreviation, name = function in ISO 6429, e.g., LINEFEED; general annotation below heading)
As now, but with annotate C0

These were evaluated by straw polls. "Can live with" was inconclusive; "prefers" yielded more information.

Moved by Davis, seconded by Whistler

[#79-M11] Motion: That the C0 and C1 control characters be represented in the names list as follows: character code, glyphic image consisting of a dotted box containing the abbreviation for the character, the name <control>, and aliases for the character’s equivalent in other standards.

8 for; 0 against, 2 abstentions (Sun, Justsystem)

Motion approved.

Action Item 79-33 for Davis: Have Cora Chang create glyphs for the C1 control characters.

Action Item 79-34 for Whistler: Amend entries for the C0 and C1 control characters in the annotated name list that is used for generation of the charts.

Umamaheswaran pointed out that "Delete" is not part of ISO 6429. Davis proposed that we also annotate the "delete" character. Agreed by consensus.

Moore was concerned about lack of consistency between ISO/IEC 10646 and the Unicode Standard. Ksar suggested that this should be part of US comments on the second edition. Freytag argued against this: he believes the final version of the new edition of ISO 10646 will be O.K.

Iota Subscript

Davis said that based on the information he had found, what we have is O.K. No change to the decision at the December 1998 UTC meeting is needed.

Eyelash RA (continued)

[Document L2/99-046, Specification of UTC Resolution on Eyelash RA]

This document summarizes UTC discussion and decisions of Wednesday, and specifies what needs to be done to the Version 3.0 text.

Action Item 79-35 for Editorial Committee: Change treatment of eyelash RA based on L2/99-046.

Weierstrass Symbol

Recommendation to L2: Request WG2 to note the name discrepancy (though named "SCRIPT CAPITAL P" it is actually a lower case form) in Annex P.

Davis noted that this is not a cased character, but a symbol.

Action Item 79-36 for Whistler: Change annotation for U+2118 to ‘this has the form of a lower case calligraphic p, despite its name."

Math Braces, etc.

[Document L2/99-043]

Sargent reported on Action Item 78-33: Develop proposal for a complete repertoire of parts for braces, etc. found in printer fonts.

Sargent said aligned vertical bars are needed for correct representation. McGowan commented that we should add no more than are needed to support existing fonts. He pointed out that Postscript uses a single vertical bar. Freytag said that at least 2 widths are needed, for integral & braces. Unification with existing characters may be imposing a constraint on connection.

Davis asked whether all the proposed characters are in existing fonts? Sargent & Carroll confirmed this. Freytag pointed out that the Microsoft Symbol font has 5 verticals. Carroll said that multiple encodings may point to a single glyph. McGowan reiterated that he wants to avoid encoding more verticals than we need.

Action Item 79-37 for Sargent: Provide more information on the metrics of "math braces" (pieces of math characters used in printing).

Action Item 79-38 for Sargent: Provide information on mapping the "math pieces" to Postscript fonts, etc. (Benefit: would complete Symbol font mapping.)

Math Variants

[Document L2/99-045]

Sargent reported on Action Item 78-41 (a): variant definitions for all math "alphabets"

McGowan said that we should face the music for lightweight mark up protocols. Small sets of characters with tight semantics are a good thing. We need a meta-discussion on this.

Davis is leaning towards a swath of a higher plane for individual characters; because it saves having to support in rendering engine. McGowan argued that you still have to support other features for math, e.g. layout and searching.

Davis argued that the difference from ideographic variants is that there a specific table lookup for these. Math is open-ended.

Whistler raised the problem of combining character sequences. Mathematics will use them productively with other characters.

Direction of discussion was towards encoding on a higher plane.

Action Item 79-39 for Sargent: Develop operational rules for math variant characters.

Action Item 79-40 for Sargent: Take all permutations under the rules (see AI 7939) and sketch out how the variant characters would be laid out in the Plane 1 area designated for Symbols.

Action Item 79-41 for Sargent: Solicit opinion of math community regarding encoding options.

Whistler opposed a size operator because different sizes are already encoded. Adding a size operator would introduce canonical equivalencies.

Shape/orientation variants

In response to a question, Sargent said that the unified number of characters is around 2000. Freytag said that there is a tradition of glyphic approach for these, so they may not be unifiable. There needs to be discussion on how unifiable they are. Whistler felt they should not be in Part A of the proposal.

Armenian

[Document L2/98-426, (NSB comments on SC 2 N 3134)]

Suignard presented a proposal for the US response to Armenian national standards body. It was recommended to add a pointer to instructions on how to submit characters to WG2.

FRIDAY, DECEMBER 4

PRESENT: Apple Computer, Inc.; Hewlett-Packard Company; IBM Corporation; Justsystem Corporation; Microsoft Corporation; The Research Libraries Group, Inc.; Sun Microsystems, Inc.; Sybase, Inc.; Unisys Corporation; Xerox Corporation

BY PROXY: SAP AG

(Total members represented: 11)

NOT PRESENT (at time of roll-call): Booz, Allen, Hamilton, Inc.; Compaq Computer Corporation; Oracle Corporation; Reuters, Ltd.

(Total not represented: 4)

The representative of Oracle Corporation arrived after roll-call: 12 members represented.

Japanese national standards body Comments to SC2

[Document L2/99-040]

UTC consensus was that Unicode is the best test of market relevance.

Becker pointed out that the proposal for Buginese has no user community support. McGowan said that it is not always possible to obtain this, particularly for minority scripts. Aliprand felt that there was a need for scholarly support on proposals: the proposal for Syriac was a good example where the scholarly community had been consulted. The Glagolitic proposal, in contrast, had apparently not been taken to the Early Slavic Studies Association.

McGowan expressed concern about premature proposals being pushed into WG2. Ksar said that WG2 is not in a hurry on these proposals. If members are interested they will come to meetings.

Freytag said that the proposed method of handling the repertoire issue has a big problem. It could result in random disunifications. McGowan suggested that Ksar (as Convenor of WG2) invite the Irish national standards body to respond.

Freytag asked: How will the positions of Netherlands and Japan affect our work? Scripts of interest to vendor community are both living and extinct. McGowan said that we have tried to prioritize scripts, and we are not responsible for other proposals.

Freytag suggested prioritizing actual proposals (say, in Copenhagen). If we propose this to WG2, we need to have our proposals ready (say, the 5 scripts we think most important).

McGowan said the problem is not that relevant scripts are not being pushed, but national standards bodies are seeing many things that they can't contribute to. Also, there is the danger of approval when a proposal is premature. We don't want stuff coded that doesn't have user community support: in this, we agree with Japan. However, we don’t agree on "rareness" exclusion. We need a way to prioritize script proposals.

Umamaheswaran said that item #3 is being addressed by Action Items out of the London WG2 meeting. Ksar said that WG2 is not going to encode any additional characters or scripts unless they are mature.

BiDi

[Document L2/99-050|

Suignard asked how conformity for a BiDi implementation would be indicated. Davis replied that an existing implementation would be described as "conformant to Version 2.0." He added that John McConnell represented Microsoft at BiDi Ad Hoc meeting in January.

Suignard expressed concern about BiDi property for unassigned characters. Freytag asked if this related to a rule for the end of blocks? Davis: Said that a rule would be included.

Consensus on "NSM" instead of "CM."

Action Item 79-42 for Whistler: Properties listed in section C of L2/99-050 to be added/fixed in 2.1.9. All other property changes are to be held for 3.0.

BN= CC+CF that BiDi type On, e.g., SI, SO, backspace, etc. Should also include ZW space.

Moved by Davis, seconded by Freytag

[#79-M12] Motion: To take the following actions:

1. Roll in roll in all corrections from this meeting, restructuring the BiDi UTR;

2. Have the BiDi Ad Hoc committee review the draft

3. Editorial changes may be made by the BiDi Editorial Ad Hoc committee

4. Publish as a Unicode Technical Report.

11 for; 0 against, 1 abstention (SAP)

Motion approved.

The BiDi Ad Hoc committee addressed technical issues. The BiDi Editorial Ad Hoc committee consists of Davis, Moore, Ksar, and Freytag.

Thanks to Moore for organizing meeting that was critical to making progress. (Applause)

ISO 14651

[Document L2/99-051, Proposal to UTC for comments on ISO 14651 ballot]

With respect to point 4 on data, Texin expressed concern about interoperability. Whistler said it is already a problem. The whole meta realm of definition of collations that make use of 14651 has not been defined. Davis said that you can do a satisfactory implementation with 3 levels. Texin asked: Doesn’t theUnicode algorithm have the same problems? Davis replied that you can have more than 3 levels. Whistler pointed out that how to specify interchangeable collation behavior has not been specified. Ksar recommended adding a note about interoperability, which is not covered by this standard.

Moved by Davis, seconded by ??

[#79-M13] Motion: To adopt document L2/99-051 with some of changes as the basis for the U.S. position, and if possible, as a liaison statement from the Unicode Consortium.

10 for, 0 against, 1 abstention (SAP)

Motion approved.

Action Item 79-43 for Davis: Prepare new version of document L2/99-051 and send to Winkler by March 1.

Approval of the Minutes

Moved by Whistler, seconded by Jenkins

[#79-M14] Motion: To approve the revised minutes in document L2/98-281R.

9 for; 0 against, 3 abstentions

Motion approved.

Moved by Whistler, seconded by Jenkins

[#79-M15] Motion: To approve the minutes of joint meeting UTC #78/L2 # in document L2/98-419 as amended.

10 for; 0 against, 2 abstentions

Motion approved.

Hart praised the detail in these minutes. The Chair said that Tex Texin, who served as Secretary for the meeting, deserved the credit.

UTF-EBCDIC as Informative Annex to ISO/IEC 10646

[Document L2/99-034R]

Moved by Umamaheswaran, seconded by Davis

Motion: The UTC supports addition of UTF-EBCDIC as an informative annex to ISO/IEC 10646.

[See below for voting]

Hart said that SHARE would support this proposal. It provides a migration vehicle for existing systems.

Whistler said that Sybase favors the specification, but does not think it can support addition as Annex to 10646, because of the problem of proliferation of UTFs. Publication as UTR is preferable.

4 for; 6 against, 2 abstentions (Hewlett-Packard, Justsystem)

Motion failed.

Jenkins moved adjournment of the meeting; Moore seconded.

Meeting adjourned at 12:25 p.m.

ATTACHMENT 1

UTC #78 and L2 #175 Joint Meeting – Attendees

Wednesday, February 3, 1999

Joan Aliprand; RLG; [email protected]

Julie Allen; Unicode, Inc.; [email protected]

Joe Becker, Xerox, [email protected]

Don Carroll, Hewlett-Packard,

Mark Davis; Unicode, [email protected]

Asmus Freytag, Unicode, [email protected]

Edwin F. Hart, SHARE, [email protected]

Hideki Hiura; Sun Microsystems; [email protected]

Paul Hoffman, Internet Mail Consortium

Mike Ksar, Hewlett-Packard, [email protected]

Tatsuo L. Kobayashi; Justsystem; [email protected]

Kamal Mansour, Monotype, [email protected]

Rick McGowan; Apple Computer; [email protected]

Lisa Moore; IBM; [email protected]

Murray Sargent III; Microsoft; [email protected]

Tex Texin, Progress, [email protected]

V. S. Umamaheswaran; IBM; [email protected]

Ken Whistler; Sybase; [email protected]

Arnold Winkler; Unisys; [email protected]

Jianping Yang, Oracle, [email protected]

Thursday, February 4, 1999

Michel Suignard; Microsoft; [email protected]

Friday, February 5, 1999

John Jenkins, Apple, [email protected]