ULI TC Jul 12 2011



Roozbek (Gnome)








Takashi (Teradata)


Kenneth Tang (Oracle)

Lisa Moore (IBM)

David Filip (XLIFF)


-          Charter, process and roadmap walkthrough

-          Other topics:

o       Linkage area update: XLIFF and GALA

o       Unicode segmentation character proposal

o       Default segmentation review needed


Charter, process and roadmap walkthrough

Uwe walked thru the p3-5.

A concern is raised around interoperability: the goal is work with other open standards org and instead of overloading other standards with changes; we will work with various standard groups to refer to the implementation or standards.

Lisa: If you find changes to TC procedure, let Lisa know.

Roadmap: XLIFF v2.0 reference: also the XLIFF process feature tracking WIKI has a goal for segmentation.

David: there are some prototypes but nothing is settled or stable. The good thing is, if the component is developed. If OKAPI experimental schema is being developed, it's still early in the stage.

Helena:  Multiple reference implementations are possible and should be encouraged but there needs to be a focus to ensure there is consistency among these implementations.

David: User profile can be beyond the hard standardization. The profiles can become interesting and useful.

Helena: the focus should also be the compliance of various reference implementations so they can validated.

Mati: There should be a complementary component of the standard to provide a certification suite to validate the behavior.

Kevin: Need to have process for localization to work end-to-end. Need to have that test and safeguard to ensure predicting behavior for an end-to-end localization process.

Action: Uwe will adjust our charter to include the compliance validation to standard.

David: for v2.0, we committed to XLIFF TC for processing requirements for each of the element and attributes. It should be possible to design good test suite for XLIFF 2.0.

Christian: Operating procedure :- If not already started, might be working on a couple of definitions. 1. Clear understanding of the definition of what we consider as localization data: initial focus is on segmentation is key to translation memory, and we want to make sure we have a good handle on system to system interoperability

David: Need to avoid conflict to XLIFF TC meetings.

Takashi: When we say localization, we cover both software and documentation. Are we thinking about the formatting issues?

Helena: That is OASIS issue, not our problem to solve.

Kevin: This gets back to the reference implementation discussion. The process and operate inside of the content once it's converted is more of our focus.

Christian: The linguistic data is very valuable. Unit detection is important and should be clarified. However, the plain text or canonical representation is also important.

Helena: That's covered under the normalization process.

Christian: The inline markup activities:- we have interoperability problem stemming from the fact of the different in-line tagging. This causes majority of the interoperability.

Kevin: The vendor tools do different processing for content processing from memories.

Helena: Need clear scope. It is obvious the tagging issues are pervasive but segmentation is the initial focus. The memory concerns will be worked on once the segmentation piece is more stabilized.


Other topics:


    * Unicode character proposal: segmentation break character - Leaving it unspecified but just in general a segmentation specifier. The purpose of that segmentation marker is to identify break point in a run of text.

     Mati: does that mean the receiving and sending ends may have different interpretation?

     Arle: it can be used in NLP and memories.

     Kevin: can be used plain text for defining a standard character for segmentation.

     David: Just one for everything, e.g. paragraph and sentence.

     Helena: Leave the purpose of the break level outside of the character definition itself.

     Kevin: Also a great argument to content hierarchy if more attributes are included.

     Helena: probably another OASIS problem.

     Arle: good enough to start.

     Christian: is there a purpose of this character

     Arle: action to create a proposal to UTC

     Lisa: property, the scope in which it will be used should be included in the proposal. There are some break characters already. How unique will this be? Is it a control (for formatting), domain on what this would be used, and when this will be used. Would be good to participate in UTC and CLDR meetings and call in when proposals will be discussed.

     Kevin: might be good to make the segmentation mark (word, phrase, sentence etc.) in the proposal. Align to UAX#29. These can also work in the transliteration engine also. If you know the type is a sentence boundary, modify the behavior with these tags.

     Arle: maybe it might be good to have a generic one.

     Lisa: comment on timing, should have the proposal by Jul 25 for the Aug UTC. Otherwise, it will be the Nov UTC.

     Arle: my schedule works better for Nov UTC.

     Mattias: what is the nature of this segmentation proposal?

     Kevin: in this case, we are talking about it as a delimiter so not need to be balanced.


    * Segmentation rules: review undergoing by Apple and will share more details later. Will also consolidate with UAX#29.


Action Items:

- Uwe to update the charter to include compliance test suite

- Arle to create the character proposal for the Nov UTC and Kevin to review.

- Everyone: those who are interested in being the rep to UTC or CLDR, please email Helena.