L2/07-389

Subject: Prepending characters in XCCS
From:    Mark Davis
Date:     2007-10-14

In the discussion of the Extended Combining Character Sequence (XCCS) in the last UTC, we retracted the part of the proposal dealing with the prepending characters in the proposed changes ( http://www.unicode.org/reports/tr29/tr29-12.html). I did some more digging, and was able to verify that for Thai, the following are considered fundamentally part of the same letter. So I think we would provide a more generally applicable definition of Extended Combining Character Sequence if we added a Prepend property value including the first set, and we included all of the latter characters in the Extend property value (which is the same as Grapheme_Extend = true). The same would be the case for the corresponding Lao characters.

The information I found was:

In getting the nth "letter":

1. You would never break after any of these characters:

0E40 ( เ ) THAI CHARACTER SARA E
0E41 ( แ ) THAI CHARACTER SARA AE
0E42 ( โ ) THAI CHARACTER SARA O
0E43 ( ใ ) THAI CHARACTER SARA AI MAIMUAN
0E44 ( ไ ) THAI CHARACTER SARA AI MAIMALAI

The additional rule for these would be exactly analogous to the rule for Extend:

Only for extended combining character sequences: Do not break after Prepending characters

GB9b.   Prepend  ×

2. And you would never break before any of these characters. Most of these are already in Extend; the ones marked ** would be additions.

** 0E30 ( ะ ) THAI CHARACTER SARA A
0E31 ( ั ) THAI CHARACTER MAI HAN-AKAT
** 0E32 ( า ) THAI CHARACTER SARA AA
** 0E33 ( ำ ) THAI CHARACTER SARA AM
0E34 ( ิ ) THAI CHARACTER SARA I
0E35 ( ี ) THAI CHARACTER SARA II
0E36 ( ึ ) THAI CHARACTER SARA UE
0E37 ( ื ) THAI CHARACTER SARA UEE
0E38 ( ุ ) THAI CHARACTER SARA U
0E39 ( ู ) THAI CHARACTER SARA UU
0E3A ( ฺ ) THAI CHARACTER PHINTHU

** 0E45 ( ๅ ) THAI CHARACTER LAKKHANGYAO

0E47 ( ็ ) THAI CHARACTER MAITAIKHU
0E48 ( ่ ) THAI CHARACTER MAI EK
0E49 ( ้ ) THAI CHARACTER MAI THO
0E4A ( ๊ ) THAI CHARACTER MAI TRI
0E4B ( ๋ ) THAI CHARACTER MAI CHATTAWA
0E4C ( ์ ) THAI CHARACTER THANTHAKHAT
0E4D ( ํ ) THAI CHARACTER NIKHAHIT
0E4E ( ๎ ) THAI CHARACTER YAMAKKAN