Dec 13 2011 Meeting Minutes




- Actions:
    - Contribute transform engine for SRX rules to CLDR for the Dec 2011 release. --> Kevin (Open)
    - Difficult to generate custom break iterator.
    - SRX is simply an exchange mechanism.
    - Did do some testing over a lot of software strings. Did very well.
    - IBM doing something very similar.

    - Get everyone to provide input to the SRX file contributed Rodolfo. (Open)
    - Provide IBM input on additional language supplementary input to the SRX default file --> Helena  (Closed)
    - David to connect with Helena to solicit input about ULI PR with the Multilingual-Web LT activities.  ---> David Filip  (Closed)
        - New interested parties are: Intel and Adobe.
    - Arle still owns the separator character proposal. Needs more fleshed out.---> Arle Lommel  (Open)
    * business scenario input by Xmas
    * Final draft ready for TC review by Xmas as well
    * Need to finalize by Jan 13.
- Liaison to CLDR TC report back (Kevin)
    * cc-ed ULI on the draft of the report. CLDR is very receptive to the proposal to refine the sentence break
    * Would like to have the proposal refined to a more CLDR like proposal: what should be modified and what should be included. The idea is to include as many reasonable abbreviations that would otherwise cause break.
    * The integration into LDML should not be very difficult. Would like to keep the localized stuff at the right place.
    * CLDR December date has been moved out.
    * Suggest to add exceptions to the default CLDR set.
    * Went through examples and took the SRX contributed and converted to the LDML and ran the vanilla sentence break over some amount of text. Listed some common examples.
    * Leave open the conversation for us where everything gets organized
    * Also mentioned pipeline vs monolithic segmentation: related to the UTC proposal Arle is working on. Acknowledged the original proposal was not as well received but perhaps re-orientation may help.
    Arle: Maybe back to adding the new characters?
    Kevin: SRX may not be our bid. Making ways to create custom break iterator is our goal. If it's monolithic within a single system, there is no need  to create characters for that behavior.
    Helena: Would rather scope it down to within a single system first.
    Arle: When we are indicating in the downstream process, or conversely, indicating a manual change to the segmentation in the process to correct the default. That information can be passed down even if they are not using that to generating the segmentation behavior on the fly.
    Kevin: The way I framed it is localization tools use its rules to process the information. The idea that we would impose this two ways: 1. improve ICU break behavior 2. can extend on top of the default. The main goal is to achieve the normative effect of segmentation. The character proposal is still important.
    Helena: Have you made a proposal and presentation to UTC?
    Arle: I have not yet.
    Christian: Just mentioned proposal for localization industry. We do not limit it to localization industry. It's a general language issue and once a decision is made on how to represent segmentation. We should avoid positioning this as too much in the localization/content areas.
    Helena: need real business examples and scenarios.
    Kevin: agree we should have more other types of examples and bring discussions to a wider context. Several rules in default SRX such as "d." "a." are more ambiguous than others. Need systematic review on our exceptions. It's better not to break than break.
    Helena: need to check if Arle is on the UTC mailing list. Will invite John Emmons to come share CLDR overview and process.
    David: See lots of value in improving ICU behavior and good UAX29 SRX exceptions. I believe this work in optimizing and compiling rules may be too big. Should separate tasks from characters. Should be the last considerations. The scope of the char should be considered if we do go for it. If it's just for plain text or broader scope. I wonder how this is supposed to improve things. If we just have the character, what was the rule that made you put the char there. Works if it's just one cycle. This can become a hygenic issue and having it all over the place. For this, the markup solution might be better. Hope there is a good solution for 2.0.
    Helena: char pair (joining and separating) is intended just for plain text and not for XML based content.
    David: this limited scope is better. Would be very interesting to see when something leaves markup env and enters into plain text how this should be handled. The machine time spent is huge. Good to be able to hard code the break with a single character but there will be ambiguity.
    Helena: I'd rather the communication of metadata to be outside of this scope.
    Arle: to convey the metadata will live outside of the scope of this discussion.
    Kevin: processor can pass these downstream if the self-contained break behavior and totally isolated from each other.
    Christian: question, we talked a bit about procedures that are applicable, if one of us goes out to other TC's what's the steps?
    Helena: We had gone over Kevin's proposal in the last meeting and agreed on the logistics.
    David: One more question on process, is there any processing requirement or blueprint for characters.
    Helena: Is there any public information that can be shared? Arle?
    Arle: There is no processing requirements explicitly but there is a section on character property but can be added to the usage section.
    David: there must be some control character with processing requirements.
    Arle: that's covered in the character property section. Need to share additional pointers.
    Helena: Let's bring that offline.
    David: We might need to define it more than what Unicode character proposed. There is a Unicode report and w3c directive on the behavior of certain characters in the markup and non-markup env. Should describe how the char should be behave when enters the markup env.

- Liaison to UTC report back (Arle)
    * Helena will take action to figure out if Arle is on UTC mailing list so he can make a proposal to UTC.

- Liaison to XLIFF activities (David)
    * Oracle rejoined the group.
    * Andrew P. from Welocalize is joining. He has good idea on processing requirements in XLIFF.
    * Apple would be interested also.
- MLW-LT will formally exist in Jan 2012
    * large part of the work will be making reference implementation of the meta data proposal. it is somewhat related.

- Promote and encourage participation. CLDR overview.
    * Helena will give a short article with TC feedback by end of this week for David F. and Arle.
    * David F. want ULI, XLIFF and MLW-LT articles.