L2/05-352 Subject: TR 29 and Myanmar/Khmer Date: Wed, 26 Oct 2005 From: Martin Hosken Dear All, Thinking further about Myanmar and Khmer with respect to UTR 29, I would like to propose the addition of a new property to the set, for the purpose of cluster identification. I don't care what it's called, but I'll call it "Propagating" in this discussion. The property of a Propagating character is that it has the properties of Extending and it also imparts that property to the following character. This seems a natural solution to the question of how to identify cluster breaks in the presence of a virama. For example, U+1000 U+1039 U+101F U+102F (hku) using the current algorithms would cause a cluster break between the U+1039 and the U+101F right in the middle of a diacritic ligature, and the worst place for a cluster break, almost anywhere else in the string would be better! In the case of ZWNJ, then U+1000 U+1039 U+200C, again the break would occur. Notice also that in Myanmar it is possible to follow a ZWNJ by a diacritic as in U+1004 U+1039 U+200C U+1037. The proposed approach would meet the need nicely. There is one situation where while the approach works it doesn't work in the way we might expect: kinzi. U+1004 U+1039 U+1000 results in a base character U+1000 followed by a kinzi. But this is still a base diacritic combination, and so no cluster break should occur. The proposed approach correclty identifies the lack of a break. UTR 29 would need to add boundary rules: X Propagating Propagating X The one problem with this whole proposal is that there is no way to identify the class of characters covered by Propagating from the categories we currently have for characters. Either this would have to become a non-derived property or we have to do the unthinkable and add a general category! I am assuming that such a category would also be appropriate for the halant in most of the Indic scripts, even if sometimes it would mark that a non-conjoining sequence should not be broken. But then we start getting into a much finer level of detail to resolve such issues. Yours, Martin Hosken PS. Should this email become an L2 document and go on the agenda?