ContentsStandardUpdates and ErrataTechnical WorkOnline DataConferences

 

UTC #76 - Unconfirmed Minutes

L2/98-158 
posted May 26, 1998

Draft Minutes - UTC #76 & NCITS Subgroup L2 #173 joint meeting
Tredyffrin, Pennsylvania - April 20-22, 1998

Chair Aliprand convened the joint meeting of the UTC and L2 (L2 Ad Hoc) at 1:35 p.m., Monday, April 20, 1998. An informational L2 meeting on TC 46 character sets had been held in the morning .


Administrative Issues

UTC Membership Roll Call -- See Attachment 1 for list of Attendees

PRESENT: Digital Equipment Corporation; Hewlett-Packard Company; IBM Corporation; Microsoft Corporation; Novell, Inc.; Oracle Corporation; The Research Libraries Group, Inc.; Sun Microsystems, Inc.; Unisys Corporation;
BY PROXY: Apple Computer, Inc.
(Total represented: 10)
Quorum = 9

NOT PRESENT: Booz, Allen, Hamilton, Inc.; Mathema Software, GmbH; NCR; Reuters, Ltd.; Silicon Graphics, Inc.; Sybase, Inc.; Xerox Corporation. (Total not represented: 8)

[Note: Action Items 2 through 5 are within the L2 Minutes as Action Items 173-2, 173-3, 173-4, and 173-5]

Technical Issues

Formal Criteria on Disunification - Document L2/98-152

Freytag had been given the action item to revise this document on the basis of feedback from the February UTC/L2 joint meeting. He submitted the revision to WG2 in Seattle; this revision is document L2/98-152.

He proposed that the responsibility for this document and for Formal Criteria for Coding Precomposed Characters (L2/98-097) be transferred to Umamaheswaran as input for his WG2 procedures document. Umamaheswaran agreed to take the two documents and create a draft which would be procedures for both WG2 and the UTC.

Specific comments:

page 2, 1st bullet (at bottom): Add automatic font assignment.

Another benefit: Differentiation in case pairing.

Hiura suggested that there should be more background reasoning re Han unification. It was agreed that this section needs a pointer to the discussion of Han unification in the separate Annex.

Consensus to process this document as input to WG2 from UTC and L2.

The text for UTC/L2 input to WG2 will be from the note about Han unification through the end of the document, with modifications as proposed.

Action Item 76-6 for Freytag: Update L2/98-152 Formal criteria on disunification, assign new L2 number, post on the Unicode Web site and notify Umamaheswaran when done.
Action Item 76-7 for Umamaheswaran: Work with Freytag to prepare revision draft of L2/98-152, Formal criteria on disunification for July UTC meeting, including expansion of text re costs
Action Item for Aliprand/Winkler (in 76-1): Put revision draft of Formal criteria on disunification on agenda for UTC/L2 joint meeting in July.

Formal Criteria for Coding Precomposed Characters -- Document L2/98-097

This document was submitted at the WG2 meeting by Freytag and Whistler. The whole text is eligible for the procedures document.

The first negative point "May not introduce ..." was considered unclear. Umamaheswaran summarized it as: No dual coding (spelling) for that script.

Ng asked whether there was relative priority on the positive and negative points, that is, how are conflicting goals to be resolved? Freytag replied that the intent of the document was to be flexible.

Umamaheswaran asked whether the notes were to be included. Freytag responded that the notes are important. They could be included in cost criteria.

Freytag noted that, at present, off-the-shelf technology does not provide adequate rendering technology. Hart commented that the addition of precomposed characters is a short-term solution with permanent costs.

Action Item 76-8 for Freytag: Arrange with Mike Ksar to post soft-copy of L2/98-097 Formal criteria for coding precomposed characters on the Unicode Web site. If soft-copy is not available, scan L2/98-097. Notify Umamaheswaran when soft-copy is available.
Action Item 76-9 for Umamaheswaran: Work with Freytag to prepare revision draft of L2/98-097. Formal criteria for coding precomposed characters for July UTC meeting, including rewording of first negative point
Action Item 76-10 for Freytag: Test disunification of combining characters using umlaut/diaeresis as a test case.

Honomichl objected to the use of a known debatable case, and suggest first use should be obvious test cases.

Inline and Interlinear Annotations -- Support for implementing Inline and Interlinear Annotations in East Asian Typography -- Document L2/98-099

(Unisys out of room)]

Freytag: The goal is to progress this proposal to a draft Unicode Technical Report.

Hiura pointed out that both he and Kobayashi had objected to this proposal at the August meeting. He believes the issue of ruby should be dealt with by a higher level protocol.

Freytag responded that he was given the action item to update this proposal at the August UTC. Aliprand commented that Freytag had said that he knows of three separate implementations that are using this technique.

(Unisys returned)

Hiura said that he does not know of any plain text implementations of ruby in Japan, Korea, or China. He asked: is it essential at this stage? With respect to the suggested fall-back behavior, he can show that parentheses is not a good solution.

Freytag said that there are three or more different ways: dispersed; centralized; associated with a character.

Suignard said that you could have a marker in the annotation to show its type. The annotation proposal is intended as a hint for higher level processing. Can mark plain text with the beginning and ending of the annotation. Microsoft is using C1 values for the markers now.

Freytag said that we need to distinguish between rich text formatting and minimal legibility. This proposal also provides support for instream implementation as opposed to data interchange; there are a few other examples of characters in Unicode whose primary use is to facilitate internal processing.

The plain text backing store will have formatting information on the side. It sometimes helps to have an instream marker, and codes must be reserved for such characters. The characters in this proposal are similar to the Object Replacement Character (OBJ). It is not possible to use private use characters without losing some generality.

What is a sufficiently general model for these characters to allow minimal legibility? He would prefer general annotation characters to characters intended specifically for ruby. The proposal can support limited interchange; it shows the relationship between base text and annotation text. Mono ruby and group ruby can be distinguished by segmentation.

Leisher asked: Why not use HTML? Freytag replied that the proposal is intended for use at the lowest level, internal to an implementation. Leisher asked whether the text could be read without ruby. Freytag pointed out that dropping ruby means dropping content, whereas dropping italics (for example) does not entail loss of content.

Hiura said that his proposal is a new approach. Freytag said that the OBJ is also new, added to the Unicode Standard for a similar purpose. The protocol itself carries little information, and ruby could (in principle) be implemented as OBJ. However, this is not as appealing, because you would have to put the base character there as well.

Leisher said that you could put just the record length. Freytag responded that this would complicate the layout of the rest of the line. He pointed out that text that is in the annotation can be as richly formatted as the base text, with the necessity for a second text pump. This complicates things.

Leisher asked whether this proposal was intended for certain platforms. Freytag responded that an implementation is not required to use it.

Hiura said that he does not see why ruby is a special case, that it is just like italics. Freytag replied that ruby has content, and is related to a base character or group of characters. This relationship needs to be visible in plain text.

Suignard compared it to HTML, where you have a tagged (source) view of data and a formatted view. If you remove or ignore the tags, you get a plain text view.

Freytag said that we have encoded characters for implementation needs, e.g. OBJ. The proposal has the added benefit of being able to show text relationships in a plain text environment cheaply, without invoking a full rich text protocol (e.g., HTML).

Huira asked: if ruby is displayed in a plain text stream, and the user cuts and pastes kanji or ruby, how would it work? Freytag gave the example of clipboard use: With no marking, the clipboard text is simply in internal ordering. With the proposal, the receiving end has options: Take all; strip out markers (resulting in garbled text as is the case today); take out annotation. Hiura said that copy/paste should be driven by the users intention. Freytag said that choice would be at the paste end, in his experience.

Leisher said that most clipboards can handle most formats. What about interpretation? i.e., failure of a secondary application to interpret the text on the clipboard. Sargent said that is why you want a standard.

Barry commented that this proposal has other applications. He said that there are various library cases where it would be useful, e.g., spelling out a "filed as" value for a number.

Hiura asked why the proposal is limited to ruby only, since there are several equivalent forms of annotation in Japanese. If the intent is exchange, how does the receiving side interpret the data? Suignard said that the marker is needed to support higher level protocols. Such protocols need a marker for their work.

Ng asked: If all you need is a marker, do you need the ruby text itself? Freytag replied that the limiting factor is the text pump. Ng said that this seems to be for a very selective application. Freytag replied that the request is to encode characters, and applications can choose to ignore them.  Leisher and Honomichl pointed out that you would still have to write code to ignore them.

Freytag said that having these codes makes implementation easier. Sargent said that Word does ruby as fields, but this is not elegant. He added that the proposal does not tell you whether it is ruby or wariuchi.

Hiura asked: Then why don't we reserve hundreds of characters for marking rich text? He said that comments from the Japanese community and W3C had not favored this approach. Suignard questioned whether this approach had been formally discussed by W3C. Hiura replied that the W3C I18N working group had held a discussion and some people had expressed concern.

Freytag said that the answer to concern over conflict with a higher level protocol, e.g., HTML, would be for the specification for the protocol to say "do not use." Umamaheswaran pointed out that this is similar to what is done for BiDi in HTML.

Freytag then responded to Hiura's "very useful" question about many characters for rich text. Although we deal with many types of multimedia objects, we have one character for all multimedia objects because the function is to show the one place of the multimedia object in the text. There could be two types of annotation characters (instead of three) or they could be unified. He advises against individual ones for the various types of annotations in Japanese. Perhaps the types could include a library class of ignorables (for non-filing text). Need support for instream markers on which we build protocols.

Honomichl and Leisher raised the issue of losing an OBJ. Freytag replied that 3 types provides a nice safety net in cases of data corruption. You could use OBJ if you could guarantee that the out of band information was always correct. If you use only one character (OBJ) for everything, you can never nest constructs. There is also the problem of overloading OBJ in order to use it for annotations. Under the proposal, the search parser can search without going out-of-band,

Leisher pointed out that the cost is the same. Honomichl agreed with Freytag that some operations
would be cheaper.

(Unisys left the room)

Freytag said that the aim of the proposal is to help rich text support. It is not aimed at databases. Ng said that the proposal appears to be useful, but he does not want people who do not utilize it to have to pay a price.

Sargent said that this proposal inspired a recursive way to lay out text which allows you to format mathematical text. All standard math formulations can be represented this way.

Leisher asked: What is the probability of use in a vendor-specific manner? Freytag replied that the Private Use Area can be used unless there is an implied contract that all PUA must be available.

Hiura recommended that the proposal clearly state that this is intended to provide support for a rich text parser. If the aspect of internal representation was stressed, the proposal might have more acceptance. He added that the fall back rendering proposal is controversial, and, if the intent is to have true ruby, double parentheses would be annoying.

Honomichl recommended inclusion of math examples. Freytag asked: If the proposal is not addressed specifically to ruby, would it increase acceptance? Hiura said he thought it would.

(Unisys returned)

Barry noted that ISO 6630, Bibliographic Control Characters, includes annotation control characters, one of three pairs of control characters which have specific functionality. Aliprand suggested that it might be possible to treat all three using the annotation technique, with information in the annotation giving the specific function.

Moved by Freytag, seconded by Honolmichl
[#76-M1]: Motion:
To revise document L2/98-099 on the basis of input received at this meeting and publish it as a draft Unicode Technical Report for public comment.

Hiura expressed reservations at public display of a final result that the UTC had not seen. Aliprand pointed out the N1727 is available as a WG2 document, and that it would be better to have an updated version available, with math examples.

Freytag outlined the changes that would be made to L2/98-099:
· Focus on internal representation aspect;
· Change fall back rendering recommendation to make use of the glyphs for the proposed characters (per suggestion from Umamaheswaran);
· Point out that interchange results in loss of information (cf. OBJ);
· Generalize semantics;
· Use for recursive formatting;
· Give examples of use in math rendering (as well as ruby example).

Interlinear annotations
[#76-M1] Motion: To revise document L2/98-099 Support for implementing interlinear annotations as used in East Asian typography on the basis of input received at this meeting and publish it as a draft Unicode Technical Report for public comment
8 for; 0 against; 2 abstentions (Apple by proxy, Sun)
Motion Approved
Action Item 76-11 for Sargent: Send examples of coded math text and the renderings of the text to
Freytag. Renderings to be supplied as GIFs.
Action Item 76-12 for Freytag: Revise document L2/98-099 Support for implementing interlinear annotations … on the basis of input received at this meeting. Publish it as a draft Unicode Technical Report on the Unicode Web site. Solicit public comment.

Meeting adjourned at 5:10 pm


Tuesday, April 21, 1998

UTC Membership Roll Call:

PRESENT: Digital Equipment Corporation; Hewlett-Packard Company; IBM Corporation; Microsoft Corporation; Novell, Inc.; Oracle Corporation; The Research Libraries Group, Inc.; Sun Microsystems, Inc.; Sybase, Inc.; Unisys Corporation
BY PROXY: Apple Computer; JustSystem Corporation
(Total represented: 12)
Quorum = 9

NOT PRESENT: Booz, Allen, Hamilton, Inc.; Mathema Software, GmbH; NCR; Reuters, Ltd.; Silicon Graphics, Inc.; Xerox Corporation. (Total not represented: 6)

BiDi Algorithm --Document L2/98-149

Sargent said that Microsoft disagreed with the revision of paragraph T6 that Mark Davis had posted on the Web site. The revision changes existing implementations, and introduces interaction between embeddings (specifically, in relation to numbers). The reference implementation follows Version 2.0.

Umamaheswaran said that he had solicited input from IBM experts in Israel.

A mechanism where we can post comments is needed.

Moved by Sargent, seconded by Carroll
[#76-M2] Motion:
To direct Mark Davis to remove paragraph T6 from the updated text of the BiDi algorithm.
11 for; 0 against; 1 abstention (IBM)
Action Item 76-13 for Davis: Edit paragraph T6 of the updated text of the BiDi algorithm as recommended in document L2/98-149.

Newline Guidelines -- Document L2/98-153

Sargent said that information on how CR/LF is used in current software would be useful, i.e., how CR/LF is understood in DOS/Windows, Macintosh, and Unix environments.

Carroll said that the new introduction is more useful.

Action Item 76-14 for Sargent: Provide text on how CR/LF is used in DOS/Windows, Macintosh, and UNIX to Mark Davis for Newline Guidelines.

Umamaheswaran said that has sent information of EBCDIC to Davis. EBCDIC uses the NL control character from ISO 6429. He pointed out that, in the Unicode Standard, we have not explicitly recognized C1, and asked what the UTC's feelings are on this.

Action Item for Aliprand/Winkler (in 76-1): Put C1 use on agenda for UTC/L2 joint meeting in July.
Action Item 76-15 for Umamaheswaran: Work with Mark Davis to add EBCDIC use of NL to Newline Guidelines (draft UTR)

Umamaheswaran commented that in the section "Converting from other character code sets," recommendation 2 is unimplementable. Whistler agreed that it is weak.

Whistler said that the background section needs to be expanded with more information about current practice.

Action Item 76-16 for Umamaheswaran: Provide comments on different practices in applications on the same platform with respect to CR/LF to Mark Davis for Newline Guidelines.

Sargent said that the Guidelines should also describe use of CR/LF to improve readability, e.g., in TeX, and in C source code. Umamaheswaran added: And use as record terminator.

Action Item 76-17 for Davis: Revise Newline Guidelines incorporating feedback from Umamaheswaran, Sargent and any other post-meeting comments for July UTC meeting.
Action Item for Aliprand/Winkler (in 76-1): Put revision of Newline Guidelines on agenda for UTC/L2 joint meeting in July.
The draft UTR Newline Guidelines is not yet ready for public review.
By consensus (Unisys absent)

Interaction with SC2, part 1

WG2 meeting #34 (Seattle, WA)

Report from Unicode Liaison
Suignard (as L2 International Liaison) gave a report on the WG2 meeting in Redmond. A large number of scripts were accepted for ISO/IEC 10646, and will be going to PDAM. UTC action is required to keep Unicode and ISO/IEC 10646 in synch. The Chair asked Whistler to lead the discussion on specific scripts.

Resolutions from WG2 Meeting #34 requiring UTC Action

Burmese -- Document L2/98-101 (= WG2 N1729)
The WG2 Ad Hoc meeting on Burmese and Khmer scripts discussed Lee Collins' proposal for Burmese, and resulted in a proposal to WG2 for Burmese script (in WG2 N1729). Whistler recommended reserving comments about aspects of N 1729 (e.g., parenthetical additions to names) until PDAM balloting.

Moved by Whistler, seconded by Sargent
[#76-M3] Motion:
That the UTC accepts Burmese script as specified in document L2/98-101.
Unanimous

Khmer -- Document L2/98-101 (= WG2 N1729)
Khmer has been a little more contentious, with two opposing models: using virama to create conjunct consonants vs. explicitly coded conjunct consonants. The virama model was adopted by the Ad Hoc, resulting in consistency with all the Brahmic scripts except Tibetan.

Moved by Whistler, seconded by Sargent
[#76-M4] Motion:
That the UTC accepts Khmer script as specified in document L2/98-101.
Unanimous
Action Item 76-18 for Whistler: Work with Everson and Bauhahn to facilitate acquisition of necessary Unicode information about Burmese script.

Bopomofo Extensions -- Document L2/98-090 (= WG2 N1713R)
This proposal was introduced at the WG2 meeting by TCA. Document N1713R is a revised proposal after extensive discussion at the WG2 meeting. Extensions to Bopomofo have already been anticipated in the Roadmap. The proposal reflects the consensus of mainland China and Taiwan.

Moved by Whistler, seconded by Sargent
[#76-M5] Motion:
That the UTC accepts the Bopomofo extensions and modifier letters as specified in document L2/98-090.
Unanimous

Ideographic Variation Indicator -- Document L2/98-100 (= WG2 N1728)
This topic had been controversial, but it is now clear that it is now intended as a visible graphic symbol, and not as a hidden control code. Its function is as an "enabler" to allow an inputter to get past the problem on an unencoded character.

There was a question about the direction of the slash in the proposed symbol. Suignard said that the GBK font has a forward slash.

Moved by Whistler, seconded by Sargent
[#76-M6] Motion:
That the UTC accepts the ideographic variation indicator as specified in document L2/98-100.
Unanimous

Long asked whether this character is full-width. Whistler said that was probably the case, since it is used to indicate an ideographic variant. 303E was proposed at the code point because the character is derived from GBK.

Action Item 173-19 for Suignard: Use image of ideographic variation indicator from GBK font when preparing US comments on the coming PDAM for the ideographic variation mark.
Action Item 76-20 for Whistler: Compile property assignments for the characters accepted at this meeting.

SC2 SC2 Action re Ideographic Description Sequences
WG2 discussed the proposal for ideographic description sequences (N1680), and agreed to register a subproject for it, but did not recommend that it go to PDAM. However, the corresponding SC2 resolution says that it is to go to PDAM ballot. Mike Ksar is pursuing the problem with the SC2 Secretariat.

SC Resolution on Ideographic description sequences
Recommendation #1 to L2 Plenary:

That L2 inform the SC2 Secretariat that SC2 Resolution M08.14 (f) was not in accordance with SC2/WG2 Resolution M34.17, and that the US vote was predicated on a false premise due to misinterpretation of the SC2/WG2 vs. SC2 resolutions.
By consensus

Whistler pointed out that these are graphic characters, not structural, and they occur in GBK. They can be used to "spell out" a character; there is a syntax for use. Since they are graphic characters, there is no necessity for canonical equivalences for ideographs.

Ng said that ideographic description sequences might be on the agenda of IRG meeting #11 in Japan, and asked about the UTC's position. Umamaheswaran said that WG2 still has the opportunity to fix the situation. The IRG is free to work on the PDAM text.

Other WG2 actions

Umamaheswaran reported that fixing the collection identifiers was accepted.

The Chair congratulated Michel Suignard on his appointment as project editor for ISO/IEC 10646-2. Suignard said that he needs True Type fonts for the scripts that will be included in Part 2, and mentioned the Western musical symbols proposal.

Action Item173-21 for McGowan: Supply True Type font for Western musical symbols to Suignard (for use in 10646 editorial work).

Thaana -- Document L2/98-197

Moved by Whistler, seconded by Umamaheswaran
[#76-M7] Motion: That the UTC accepts the repertoire and encoding of Thaana script, minus the RETYU SIGN, as in WG2 Resolution M34.9.
Unanimous

L2 Plenary session convened at 11:30 a.m. Please see separate L2 Minutes.


The joint UTC/L2 meeting (L2 Ad Hoc) resumed at 1:30 p.m.

Technical Issues

New Proposals

Line Breaking -- Document L2/98-151
The subject arose as part of Editorial Committee work on Version 3.0. The draft defines three major types of line breaking opportunities.

Umamaheswaran asked about line breaking opportunities in Indic scripts. Whistler surmised that this would be a special case of type 3 (morphological analysis). He suggested leaving the proposal a little open-ended to accommodate other scripts, e.g. Tibetan.

Action Item 76-23 for Suignard: Supply write-up on use of NBSP in HTML to Freytag for revision of Line breaking properties.

The relationship of line breaking and hyphenation was discussed. Freytag considered the use of character properties for a hyphenation algorithm to be outside the scope of this TR. Hyphenation (not line breaking) determines where hyphenation breaking occurs. Hyphenation hinting is provided by SHY.

Carroll asked: How would I start a line with a space? Freytag replied that the behavior would be the same as today.

Umamaheswaran suggested examination of soft controls, e.g., OBJ. The term "soft controls" does not include C0 and C1.

Moved by Freytag; seconded by Umamaheswaran
[#76-M8] Motion:
That document L2/98-151 Line breaking properties be made into a draft Unicode Technical Report (incorporating comments from this meeting) with the intent to progress it to a Unicode Technical Report at the next UTC meeting.
Unanimous
Action Item 76-24 for all members: Provide additional feedback on document L2/98-151 Line breaking properties to Freytag. To be used in next revision, must reach Freytag before May 8.
Action Item 76-25 for Freytag: Revise document L2/98-151 Line breaking properties incorporating comments from this meeting and any others received later to create a draft Unicode Technical Report. Post on Web site for public review.
Action Item for Aliprand/Winkler (in 76-1): Put revision of Line breaking properties (draft UTR) on agenda for UTC/L2 joint meeting in July.

Feedback on Version 2.0
The Editorial Committee requested feedback for improvements to Version 3.0. Freytag listed the areas of interest: ignorance, falsehoods, or gaping holes; glyph problems; keep in mind a desire to align with 106460; and, dense passages.

Leisher suggested publishing the tables separate from the text. Other suggestions were: Review of the annotations in the names list; the usefulness of printed Han charts with cross-references was questioned; a separate "UniHan companion."

Specific Scripts
Additional Non-Ideographic Characters

Proposal for Additional Mathematical and Technical Symbols -- Document L2/98-093
Document L2/98-093 is an extract from a proposal from a consortium of scientific societies and scientific/technical publishers (STIPUB). Sargent considered one of the main issues to be glyphic variants.

Action Item 76-26 for Sargent and Carroll: Work with submitters of L2/98-093 with the target of submitting a proposal on math symbols at the July UTC.
Action Item for Aliprand/Winkler (in 76-1): Put proposal on math symbols on agenda for UTC/L2 joint meeting in July.

Yi PDAM 14
While working on Yi script for Version 3.0, Whistler found problems with the PDAM text that had been approved: correlation of character name with character glyph; typographical errors in names; problems in ordering of characters. Bruce Paterson has fixed some of the problems with names.

PDAM 14 ballot on Yi script
Recommendation #2 to L2:
That L2 propose withdrawal of the current text of the PDAM-14 ballot on Yi script so that technical errors can be corrected before the ballot.
By consensus

"East Asian Width" Property -- Document L2/98-155
Freytag defined and gave examples of the six categories: Narrow, Half-Width, Ambiguous, Full-Width, Wide, and Unassigned. When you interact with East Asian legacy character sets, or are dealing with East Asian typography, you need to be aware of these categories.

The counterpart of "Half-Width" is "Wide." The counterpart of "Full-Width" is "Narrow." The "Ambiguous" category corresponds to a superset of characters from known legacy encodings.

Hiura said that JIS was planning to delete half-width katakana, and he was concerned about this.

Freytag said that the issue to consider was the default property "Wide" versus the resolved property "Wide." The resolved property "Wide" maps to full-width in a legacy character set.

Hiura asked about mapping from the half-width katakana in Unicode to a future JIS standard that lack half-width katakana. Freytag suggested that perhaps you would map the half-width katakana to the only equivalents (which are full-width) first, before mapping other full-width characters.

Moved by Freytag; seconded by Umamaheswaran
[#76-M9] Motion:
That document L2/98-155 A new Unicode Character Property "East Asian Width" be made into a draft Unicode Technical Report (incorporating comments from this meeting) with the intent to progress it to a Unicode Technical Report at the next UTC meeting.
10 for, 1 against, 0 abstentions
Action Item 76-27 for Freytag: Revise document L2/98-155 East Asian width property incorporating comments from this meeting and any others received later to create a draft Unicode Technical Report. Post on Web site for public review, with the intent to Technical Report at the July UTC meeting.
Action Item for Aliprand/Winkler (in 76-1): Put revision of East Asian width property (draft UTR) on agenda for UTC/L2 joint meeting in July.

Review of Action Items

Action Item 76-28 for Aliprand: Contact Jenkins about Action Items 72-08, 169-09.
Action Item 76-29 for Aliprand: Distribute revised UTC procedures at July UTC meeting.
Action Item 76-30 for Aliprand: Contact Goldsmith re IANA charset registration.
Action Item 76-31 for Editorial: Review errata re Japanese quote characters.
Action Item 76-32 for Ksar: Consider UTC request that enclosing triangle proposal be on agenda for WG2 meeting in September.

Meeting adjourned at 5:50 p.m.


Wednesday, April 22
UTC Membership Roll Call:

PRESENT: Digital Equipment Corporation; IBM Corporation; Microsoft Corporation; Novell, Inc.; Oracle Corporation; The Research Libraries Group, Inc.; Sun Microsystems, Inc.; Sybase, Inc.; Unisys Corporation
BY PROXY: Apple Computer, Inc.; JustSystem Corporation
(Total represented: 10)
Quorum = 9
NOT PRESENT: Booz, Allen, Hamilton, Inc.; Hewlett-Packard Company; Mathema Software, GmbH; NCR; Novell, Inc.; Reuters, Ltd.; Silicon Graphics, Inc.; Xerox Corporation. (Total not represented: 8)

Approval of Minutes
Document L2/98-070

Moved by Winkler, seconded by Umamaheswaran
[#76-M10] Motion:
To approve the Minutes of UTC#75/L2#172 joint meeting (document L2/98-070) as amended.
Unanimous

9:45 a.m. Hewlett-Packard and Novell representatives arrived. (Total represented: 12)

Action Item 76-33 for Oesterle: Add WG3 and possible special meeting of SC2 to September 1998 calendar.

Specific Scripts (continued)

Whole Scripts

Character Properties for Syriac Script -- Document L2/98-156
This document contains two appendices from the WG2 proposal for Syriac script. Appendix E is the proposed character properties for Syriac script.

Whistler summarized the proposed properties. The BiDi Ad Hoc meeting determined that combing marks should be Other Neutral. The combining marks shown in the proposal as R-L should now be Other Neutral. The combining class values appear to be correct.

Action Item 76-34 for Aliprand: Send soft-copy of Paul Nelson's Syriac properties file to Whistler and Umamaheswaran.

Whistler, as editor of the properties database, does not wish to have the UTC approve character properties piecemeal. He recommended approval in principle as each script is added, with final approval as part of the approval of the text of Version 3.0.

Moved by Whistler, seconded by Winkler
[#76-M11] Motion:
That the UTC accepts the Syriac script properties proposed in document L2/98-156 in principle for inclusion in the next version of the Unicode Standard, subject to approval of the final text of Version 3.0 by the UTC.
Unanimous
Action Item 76-35 for Editorial Committee: Review Syriac shaping proposal.

Mongolian -- Document L2/98-104
Whistler reported on work on Mongolian that occurred during the WG2 meeting. The Chinese delegation included two representatives from Inner Mongolia. China presented a new revision.

Whistler met with the Inner Mongolians, and wrote up an analysis. The majority of the Mongolian proposal has been stable for at least a year and a half. The repertoire and the model for use of the repertoire is, form the Unicode perspective, ok. The model for use is like Arabic turned on its side, with some additions.

The outstanding issue is still control character for exceptional formatting, but there was a meeting of minds on this. The Mongolians still want a Mongolian space character. Whistler suggested NBSP with a Mongolian font. The Mongolians want a vowel separator.

Mongolian can have more than one variant form in the same location. This can occur in what otherwise would be plain text. Variant marks (to differentiate the variant forms) are needed to preserve use of the basis model for presentation. Two variant marks are definitely needed, with a third to cope with rare cases.

Sargent suggested that the variant marks for Mongolian belong with the general discussion of variants. Whistler pointed out that the variant marks issue is a blocker to progress on Mongolian for the UTC, since all other Mongolian characters are not controversial. The Chinese NSB is looking to go straight to Final PDAM (FPDAM) at the WG2 meeting in September.

Freytag pointed out that Mongolian is a case where every variant form is known. The East Asian case is different; it is open-ended. Making the variation mark specific to a script would make it easier. Sargent said that a general solution would solve both the Mongolian and the East Asian problem.

Whistler agreed with Freytag. We can develop a general solution, but we also need a catalog of what the variants are. The variants are known for Mongolian; can pull them out of a data table, i.e., a complete set.

Umamaheswaran asked: Is cataloged set of things the same as encoding characters in a standard? Whistler replied that the use of variant marks (together with a catalog) allows you to utilize the basic model for Mongolian presentation.

Umamaheswaran said that the principles underlying additions to the basic architecture need to be stated. For the known problems with ideographs, these are (a) case for variant mark; (b) must have predefined set of variants.

Freytag pointed out that this is a disunification issue, i.e., the advantage of the variant mark over disunification of Mongolian. By having the variant mark apply to a specific repertoire, it can imply a specific catalog (and no requirements for other catalogs). A general variation mark, on the other hand, implies conformance liability for all variations. If we disunify, how many variation marks are required for basic functionality of the script? What exactly is the cost of an added variation character?

Whistler estimated the number of variation marks needed for BMP scripts as 3 for Mongolian and 1 for Tibetan, plus more for ideographs. Hiura said that 128 would be sufficient for ideographs.

Hiura announced that a full catalog of CJK variants has been compiled. The revised proposal will propose tags rather than marks. The maximum number of variations found is 80, so 128 is a reasonable number.

Whistler suspects there may be one or two other cases in BMP scripts. Outside the BMP, the need for variant marks for Egyptian hieroglyphs and Mayan is almost certain. Estimate a maximum of 2-3 per script, and 12 scripts.

Umamaheswaran suggested an analogy with combining sequences. Sargent asked about the interaction of the variant mark with fonts in an implementation. Hiura wondered whether variant tagging could be used.

Whistler pointed out the need for approval of a solution for Mongolian by September. Mongolian is a living script, and two NSBs support it. Freytag suggested that the UTC reserve options for a more unified approach.

Whistler summarized the proposal for Mongolian: variant mark n follows the character. The variant mark is treated as a combining character. Umamaheswaran suggested the sequence: character, variant marker, reference number (for catalog entry). Freytag pointed out that some catalogs are normative and not a matter of taste; others are not normative (e.g., a catalog of ampersands).

Leisher said that implementation is easier with script-specific information; otherwise, you overload the font. Carroll agreed with Leisher, and pointed out that OpenType has registered tags for variants.

Whistler noted that different environments (Mongolia vs. China) might use subsets of the Mongolian variants. Freytag said that, for Mongolian, the Consortium will have to publish a catalog of the variants.

Carroll said that the character nature of the ampersand is the Unicode character AMPERSAND. It is the wrong use of the character encoding scheme to indicate ampersand variants; these are glyphic differences.

Suignard proposed that the two views on variant tagging (script-specific variant marks vs. a generalized approach) be presented at the July UTC meeting.

Leisher commented that general variation marks are still going to be expensive. Have to know that a general variation applies to the font. Sargent and Hour suggested that if the number is outside the range for the font, it is not a problem. Freytag said that this would prevent growth of the variant catalog.

Freytag suggested that Sargent prepare a proposal on the alternatives for the July UTC meeting. Hiura, Umamaheswaran, and Carroll volunteered to work on this with Sargent.

Action Item for Aliprand/Winkler (in 76-1): Put these topics on agenda for UTC/L2 joint meeting in July.
· proposal on variant tagging (Hiura & Kobayashi)
· variation marks for Mongolian and general extension (Sargent)
· presentation on glyphs and font issues (Carroll)
Action Item 76-36 for Sargent (lead), Hiura, Umamaheswaran, and Carroll: Prepare discussion paper on architectural alternatives to designate variations, laying out pros and cons. Consider previous related proposals (variant tagging; transcoding). To be available at least 2 weeks (preferably 1 month) before July UTC.
Action Item 173-37 for Winkler: Send L2 reference number for August 1997 proposal on variant characters from Sun & Justsystem to "unicore".
Action Item 76-38 for Carroll: Prepare overview of glyphs and font issues for presentation at July UTC meeting.
Action Item 76-39 for Hiura: Provide information on bounding of Han, and when it will be publishable.

Korean Bangjeom Tone Marks -- Document L2/98-148
Whistler prepared this document for the WG2 meeting. It clarifies the use of existing characters, as a response to Korean comments.

Moved by Honomichl, seconded by Umamaheswaran
[#76-M12] Motion:
To endorse document L2/98-148, expert contribution in response to WG2 N1599 with the suggested addition, and forward it as a joint Unicode/US position to SC2/WG2 for processing as an editorial corrigendum to ISO/IEC 10646.
Unanimous for UTC
Recommendation #3 to L2

Interaction with SC2

IRG Meeting #11

The Officers appointed Nelson Ng (Oracle) as Head of delegation, with Hideki Hiura (Sun) and Michael Kung (Microsoft) as alternates.

Action Item 76-40 for Ng: Contact John Jenkins before IRG meeting to find out what he needs re Vertical Extension A characters (fonts and data for the Unicode database).
Action Item 76-41 for Ng: Coordinate with alternate representatives (Hiura, Kung) to ensure that the delegation expresses a consistent position.
Action Item 76-42 for Ng: Prepare report on IRG meeting #11 highlighting items of significance to the Unicode Consortium.

Closing

On behalf of the UTC, the Chair thanked Ed Hart for his incredible service to both the UTC and L2 over the years.
Endorsed by acclamation.

Hart agreed to maintain the X3L2 list for the time being, and to notify L2 if he needs to discontinue hosting.

Action Item 173-43 for Winkler: Send Hart an updated list of L2 members for revision of the X3L2 distribution list.

Winkler said that the L2 name and address list is on the Unicode Web site in the members only section.

Action Item 76-44 for all: Send ideas about UTC/L2 section of Unicode Web site to Freytag

Joint UTC/L2 meeting (L2 Ad Hoc) adjourned. Continuation of L2 Plenary commenced (see separate Minutes).


 

ATTACHMENT I

UTC #76 & L2 #173 Joint Meeting -- Attendees

Joan Aliprand; RLG: Joan_Aliprand@notes.rlg.org
Randy Barry; U.S. Library of Congress; rbar@loc.gov
Don Carroll; Hewlett-Packard; dcarroll@sea.hp.com
Asmus Freytag; Unicode/AFII; asmus@unicode.org
Edwin Hart: SHARE, Inc.; Edwin.Hart@jhuapl.edu
Hideki Hiura; Sun microsystems; hiura@eng.sun.com
Lloyd Honomichl; Novell; lloyd_honomichl@novell.com
Mark Leisher; CRL/New Mexico State University; mleisher@crl.nmsu.edu
Wei-man Long; Digital Equipment Co.; longman@zk3.dec.com
Nelson Ng; Oracle Corp.; nng@us.oracle.com
Murray Sargent; Microsoft; murrays@microsoft.com
Michel Suignard; Microsoft; michelsu@microsoft.com
V.S. Umamaheswaran; IBM; umavs@ca.ibm.com
Ken Whistler, Sybase; kenw@sybase.com
Arnold Winkler, Unisys; arnold.winkler@unisys.com

 

HomeTerms of UseE-mail