Re: New version of TR29:

From: Samphan Raruenrom (samphan@thai.com)
Date: Fri Aug 16 2002 - 05:23:24 EDT


Mark Davis wrote:
> There is a new version of Unicode Technical Report #29: Text Boundaries on
> <http://www.unicode.org/reports/tr29/>, covering grapheme-cluster, word and
> sentence boundaries. There are significant modifications to this version;
> for a summary, see <http://www.unicode.org/reports/tr29/#Modifications>.
> This is a draft version, not a final version. There are a number of open
> issues remaining. Feedback is welcome
> Feedback that is received before the UTC meeting (starting August 20) can be
> made available for the discussion of TR29 at that meeting.

FYI:
There're an open issue regarding grapheme-cluster boundaries in Thai.

* SARA AM as an Other_Grapheme_Extend?

Whether "0E33;THAI CHARACTER SARA AM" should be a GraphemeExtend
character or not?

By Unicode definition, SARA AM is an Lo, not a combining
character. But many Thai applications (MS Office/ Windows/
OpenOffice.org) treats SARA AM like a combining character (unlike SARA
AA), i.e. cursor always jump over it. Whether this is right or not is
controversial but the fact is that Windows users are used to it.

My personal question is that, if it is favorable for Thai to treat
SARA AM as part of the previous grapheme cluster, is it possible for
UTC to consider adding SARA AM as an Other_Grapheme_Extend?

---
I also notice that Grapheme_Link is removed from the grapheme-cluster
definition. This is appropriate for Thai because PHINTHU should not
cause two grapheme clusters to be linked together.

-- Feel free to disclose the contents of this message.

Regards, Samphan Raruenrom Information Research and Development Division, National Electronics and Computer Technology Center, Thailand. http://www.nectec.or.th/home/index.html



This archive was generated by hypermail 2.1.2 : Fri Aug 16 2002 - 03:26:30 EDT