L2/08-083 Date/Time: Fri Jan 25 17:15:51 CST 2008 Contact: andy.heninger@gmail.com Name: Andy Heninger Report Type: Public Review Issue Opt Subject: UAX 29 Unicode 5.1 (draft 5) Feedback The Word Boundary rules in UAX 29 for Unicode 5.1, as they currently stand, do not force an unconditional boundary after a hard break (CR, LF, etc). Hard breaks will combine Extend, Spacing Mark or Format characters. This is not new behavior for Word boundaries, but it is different from that of other boundaries types; they explicitly call out the behavior of hard breaks. Assuming that the existing behavior is what we want, the WordBreakTest.txt file needs to be modified as follows: --- WordBreakTest.txt 2008-01-25 14:52:45.000000000 -0800 +++ WordBreakTest.fixed.txt 2008-01-23 16:10:05.000000000 -0800 @@ -47,9 +47,9 @@ �0D �0D � # �.2] (CR) �99.0] (CR) �.3] �0D ՠ000A � # �.2] (CR) ՠ[3.0] (LF) �.3] �0D �01 � # �.2] (CR) �99.0] (Control) �.3] -�0D �00 � # �.2] (CR) �99.0] COMBINING GRAVE ACCENT (Extend) �.3] +�0D ՠ0300 � # �.2] (CR) �99.0] COMBINING GRAVE ACCENT (Extend) �.3] �0D �85 � # �.2] (CR) �99.0] (Control_Sep) �.3] -�0D �AD � # �.2] (CR) �99.0] SOFT HYPHEN (Control_Format) �.3] +�0D ՠ00AD � # �.2] (CR) �99.0] SOFT HYPHEN (Control_Format) �.3] �0D �31 � # �.2] (CR) �99.0] VERTICAL KANA REPEAT MARK (Katakana) �.3] �0D �41 � # �.2] (CR) �99.0] LATIN CAPITAL LETTER A (ALetter) �.3] �0D �3A � # �.2] (CR) �99.0] COLON (MidLetter) �.3] @@ -70,9 +70,9 @@ �0A �0D � # �.2] (LF) �99.0] (CR) �.3] �0A �0A � # �.2] (LF) �99.0] (LF) �.3] �0A �01 � # �.2] (LF) �99.0] (Control) �.3] -�0A �00 � # �.2] (LF) �99.0] COMBINING GRAVE ACCENT (Extend) �.3] +�0A ՠ0300 � # �.2] (LF) �99.0] COMBINING GRAVE ACCENT (Extend) �.3] �0A �85 � # �.2] (LF) �99.0] (Control_Sep) �.3] -�0A �AD � # �.2] (LF) �99.0] SOFT HYPHEN (Control_Format) �.3] +�0A ՠ00AD � # �.2] (LF) �99.0] SOFT HYPHEN (Control_Format) �.3] �0A �31 � # �.2] (LF) �99.0] VERTICAL KANA REPEAT MARK (Katakana) �.3] �0A �41 � # �.2] (LF) �99.0] LATIN CAPITAL LETTER A (ALetter) �.3] �0A �3A � # �.2] (LF) �99.0] COLON (MidLetter) �.3] @@ -139,9 +139,9 @@ �85 �0D � # �.2] (Control_Sep) �99.0] (CR) �.3] �85 �0A � # �.2] (Control_Sep) �99.0] (LF) �.3] �85 �01 � # �.2] (Control_Sep) �99.0] (Control) �.3] -�85 �00 � # �.2] (Control_Sep) �99.0] COMBINING GRAVE ACCENT (Extend) �.3] +�85 ՠ0300 � # �.2] (Control_Sep) �99.0] COMBINING GRAVE ACCENT (Extend) �.3] �85 �85 � # �.2] (Control_Sep) �99.0] (Control_Sep) �.3] -�85 �AD � # �.2] (Control_Sep) �99.0] SOFT HYPHEN (Control_Format) �.3] +�85 ՠ00AD � # �.2] (Control_Sep) �99.0] SOFT HYPHEN (Control_Format) �.3] �85 �31 � # �.2] (Control_Sep) �99.0] VERTICAL KANA REPEAT MARK (Katakana) �.3] �85 �41 � # �.2] (Control_Sep) �99.0] LATIN CAPITAL LETTER A (ALetter) �.3] �85 �3A � # �.2] (Control_Sep) �99.0] COLON (MidLetter) �.3] -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --